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ABSTRACT 



Dynamical Aspects of Information Storage 
in Quantum-Mechanical Systems 

Maxim Raginsky 



We study information storage in noisy quantum registers and computers using the 
methods of statistical dynamics. We develop the concept of a strictly contractive quantum 
channel in order to construct mathematical models of physically realizable, i.e., nonideal, 
quantum registers and computers. Strictly contractive channels are simple enough, yet 
exhibit very interesting features, which are meaningful from the physical point of view. 
In particular, they allow us to incorporate the crucial assumption of finite precision of all 
experimentally realizable operations. Strict contractivity also helps us gain insight into 
the thermodynamics of noisy quantum evolutions (approach to equilibrium). Our investi- 
gation into thermodynamics focuses on the entropy-energy balance in quantum registers 
and computers under the infiuence of strictly contractive noise. Using entropy-energy 
methods, we are able to appraise the thermodynamical resources needed to maintain re- 
liable operation of the computer. We also obtain estimates of the largest tolerable error 
rate. Finally, we explore the possibility of going beyond the standard circuit model of 
error correction, namely constructing quantum memory devices on the basis of interacting 
particle systems at low temperatures. 



iii 



Acknowledgments 



Following the hallowed tradition, first I would like to thank my thesis advisor and mentor, 
Prof. Horace P. Yuen, who not only profoundly influenced my thinking and the course 
of my career as a graduate student, but also impressed upon me this very important 
lesson: in scientific research, one should never blindly defer to "authority," but rather 
work everything out for oneself. Next I would like to thank Profs. Prem Kumar and Selim 
M. Shahriar for serving on my final examination committee. Dr. Giacomo M. D'Ariano 
(University of Pavia, Italy) for serving on my qualifying examination committee and for a 
careful reading of this dissertation, which resulted in several improvements. Dr. Viacheslav 
Belavkin (University of Nottingham, United Kingdom) for interesting discussions, and Dr. 
Masanao Ozawa (Tohoku University, Japan) for valuable comments on my papers. I also 
gratefully acknowledge the support of the U.S. Army Research Office for funding my 
research through the MURI grant DAAD19-00-1-0177. 

Most of my research was conceived and done in the many coffeehouses of Evanston 
and Urbana-Champaign. Therefore some credit is due the following fine establishments: 
in Evanston, the Potion Liquid Lounge (now unfortunately defunct). Unicorn Cafe, and 
Kafein; in Urbana-Champaign, the Green Street Coffeehouse and Cafe Kopi (which also 
serves alcohol). 

During the three years I have spent at Northwestern as a grad student, I have had a 
chance to meet some interesting characters, with whom it was a real pleasure to discuss the 
Meaning of Life and other, less substantial, matters, often over a pint or two of Guinness. 
These people are: Jeff Browning, Eric Corndorf, Yiftie Eisenberg, Vadim Moroz, Ranjith 
Nair, Boris Rubinstein, Jay Sharping, Brian Taylor, and Laura Tiefenbruck. Did I forget 
anyone? It is also a pleasure to thank my friends outside Northwestern, for believing in 
me and for being there. This one goes out to the high-school crew: Mark Friedgan, Alex 
Rozenblat, Mike Sandler, Ilya Sutin, and Arthur Tretyak. 

I owe a great deal of gratitude to my parents, Margarita and Anatoly Raginsky, who 
always encouraged my interest in science and mathematics, and to my brother Alex, with 
whom I made a bet that I would earn my doctorate by the time he graduated from high 
school. Fork over the fifty bucks, dude! And, last but not least, I would like to thank 
my parents-in-law, Rosa and Vladimir Lazebnik, and my sister-in-law, Masha, for their 
support. 

Finally, I must admit that above all I cherish and value the love of my wonderful wife 
Lana. I dedicate this dissertation to her. 



iv 



To Lana 



V 



Contents 



ABSTRACT iii 

Acknowledgments iv 

1 Introduction 1 

2 Basic notions of quantum information theory 6 

2.1 Classical systems vs. quantum systems 6 

2.1.1 Algebras of observables 7 

2.1.2 Pure and mixed states 9 

2.2 Channels 14 

2.2.1 Definitions 14 

2.2.2 Examples 16 

2.2.3 The theorems of Stinespring and Kraus 19 

2.2.4 Duality between channels and bipartite states 22 

2.3 Distinguishability measures for states 25 

2.3.1 Trace-norm distance 25 

2.3.2 Jozsa-Uhlmann fidelity 27 

2.3.3 Quantum detection theory 28 

2.4 Distinguishability measures for channels 31 

2.4.1 Norm of complete boundedness 31 

2.4.2 Channel fidelity 33 

3 Strictly contractive channels 38 

3.1 Relaxation processes and channels 39 

3.2 Strictly contractive channels 41 

3.2.1 Definition 41 

3.2.2 Examples 43 

3.2.3 Strictly contractive channels on iS(C^) 44 

3.2.4 The density theorem for strictly contractive channels 47 

3.3 Strictly contractive dynamics of quantum registers and computers 49 

3.4 Error correction and strictly contractive channels 51 

3.4.1 The basics of quantum error correction 51 

vi 



Contents vii 



3.4.2 Impossibility of perfect error correction 54 

3.4.3 Approximate error correction 56 

3.5 Implications for quantum information processing 58 

3.5.1 General considerations 58 

3.5.2 Case study: ensemble quantum computation using nuclear magnetic 
resonance 59 

3.5.3 Where do we go from here? 62 

4 Entropy-energy arguments 64 

4.1 Definition and properties of entropy 64 

4.2 The Gibbs variational principle and thermodynamic stability 67 

4.3 Entropy-energy arguments and quantum information theory 69 

4.4 Entropy-energy balance and the maximum number of operations 72 

4.5 Thermodynamic stability of large-scale quantum computers 77 

4.6 Putting it all in perspective 80 

5 Information storage in quantum spin systems 82 

5.1 Toric codes and error correction on the physical level 82 

5.2 Laying out the ingredients 84 

5.3 Putting it together 85 

5.4 Summary 87 

6 Conclusion 89 

Appendix A: Mathematical background 92 

A.l C*-algebras 92 

A. 2 States, representations, and the GNS construction 93 

A.3 Trace ideals of B{J^) 95 

A. 4 Fixed-point theorems 96 

Appendix B: List of symbols 98 

Bibliography 99 

Vita 109 



List of Figures 



1.1 Orbits defined by input density operators p and a in the state space S{M') 
of tlie quantum system witli tlie Hilbert space in tfie case of (a) noiseless 
(reversible, unitary) channel; and (b) noisy (irreversible, non-unitary) channel. 3 

2.1 Using entanglement to distinguish between quantum channels 33 

3.1 The effect of a strictly contractive channel T on the state space S{^) of 

the quantum system 42 

5.1 Square lattice on a torus 83 



viii 



CHAPTER 1 



Introduction 



Quantum memory will be a key ingredient in any viable implementation of a quantum 
information-processing system (computer). However, because any quantum computer re- 
alized in a laboratory will necessarily be subject to the combined influence of environmental 
noise and unavoidable imprecisions in the preparation, manipulation, and measurement 
of quantum-mechanical states, reliable storage of quantum information will prove to be 
a daunting challenge. Indeed, some authors [95, 139] found that circuit-based quantum 
computation (i.e., a temporal sequence of local unitary transformations, or quantum gates 
[8]) is extremely vulnerable to noisy perturbations. The same noisy perturbations will also 
adversely affect information stored in quantum registers (e.g., between successive stages 
of a computation). 

Therefore, since it was first realized that maintaining reliable operation of a large- 
scale quantum computer would pose a formidable obstacle to any experimental realization 
thereof, many researchers have expended a considerable amount of effort devising various 
schemes for "stabilization of quantum information." These schemes include, e.g., quantum 
error-correcting codes [68], noiseless quantum codes [150], decoherence-free subspaces [78], 
and noiseless subsystems [69]. (The last three of these schemes boil down to essentially 
the same thing, but are arrived at by different means.) Each of these schemes relies for 
its efficacy upon explicit assumptions about the nature of the error mechanism. Quantum 
error-correcting codes [68] , for instance, perform best when different qubits in the computer 
are affected by independent errors. On the other hand, stabilization strategies that are 
designed to handle collective errors [69, 78, 150] make extensive use of symmetry arguments 
in order to demonstrate existence of "noiseless subsystems" that are effectively decoupled 
from the environment, even though the computer as a whole certainly remains affected by 
errors. 

In a recent publication [118], Zanardi gave a unified description of all of the above- 
mentioned schemes via a common algebraic framework, thereby reducing the conditions for 
efficient stabilization of quantum information to those based on symmetry considerations. 
The validity of this framework will ultimately be decided by experiment, but it is also 
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quite important to test its applicability in a theoretical setting that would require minimal 
assumptions about the exact nature of the error mechanism, and yet would serve as an 
abstract embodiment of the concept of a physically realizable (i.e., nonideal) quantum 
computer. 

In this respect, the assumption of finite precision of all physically realizable state 
preparation, manipulation, and registration procedures is particularly important, and can 
even be treated as an empirical given. This premise is general enough to subsume (a) 
fundamental limitations imposed by the laws of quantum physics (e.g., impossibility of 
reliable discrimination between any two density operators with nonorthogonal supports), 
(b) practical constraints imposed by the specific experimental setting (e.g., impossibility 
of synthesizing any quantum state or any quantum operation with arbitrary precision), 
and (c) environment-induced noise. 

As a rule, imprecisions in preparation and measurement procedures will give rise to 
imprecisions in the building blocks of the computer (quantum gates) because the precision 
of any experimental characterization of these gates will always be affected by the precision 
of preparation and measurement steps involved in any such characterization. Conversely, 
the precision of quantum gates will affect the precision of measurements because the 
closeness of conditional probability measures, conditioned on the gate used, is bounded 
above by the closeness of the two quantum gates [11]. 

Incorporation of the finite-precision assumption into the mathematical model of noisy 
quantum memories and computers has to proceed in two directions. On the one hand, we 
must characterize the sensitivity of quantum information-processing devices to small per- 
turbations of both states and operations. This is important for the following reasons. First, 
any unitary operation required for a particular computational task must be approximated 
by several unitary operations taken from the set of universal quantum gates [S]. Since any 
quantum computation is a long sequence of unitary operations, approximation errors will 
propagate in time, and the resultant state at the end of the computation will differ from 
the one that would be generated by the "ideal" computer. This issue was addressed by 
Bernstein and Vazirani [11] who found that if a sequence of gates Gi,G2, ■ ■ ■ , Gn is approx- 
imated by the sequence G[,G2, . . . , G'^, where the ith approximating gate G[ differs from 
the "true" gate Gi by e^, then the corresponding resultant states will differ by at most 
ei + e2 + . . . + en- Secondly, in the case of noisy computation, each gate will be perturbed by 
noise, thus resulting in additional error. This situation was handled by Kitaev [66], with 
the same conclusion: errors accumulate at most linearly. Therefore, if we approximate 
the gates sufficiently closely, and if the noise is sufficiently weak, then we can hope that 
the resulting error in the output state will be small. The same reasoning can be applied 
to perturbations of initial states: if two states differ by e, then the corresponding output 
states will also differ by at most e. However, these conclusions are hardly surprising; they 
are, in fact, simple consequences of the continuity of quantum channels and expectation 
values. 

There is, on the other hand, another aspect of the noiseless/noisy dichotomy, which 
has been so far largely overlooked. Assuming for simplicity that all operations in the 
quantum system (register or computer) take place at integer times, each initial state 
(density operator) po defines an orbit in the state space of the system, i.e., a sequence 
{PnjneN where p„ is the state of the system at time n. According to the circuit model of a 
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(a) (b) 

Figure 1.1: Orbits defined by input density operators p and a in the state space Si^M') 
of the quantum system with the Hilbert space ^ in the case of (a) noiseless (reversible, 
unitary) channel; and (b) noisy (irreversible, non-unitary) channel. 



quantum computer, each time step of the computation is a unitary channel. Now consider 
a pair of initial states po? ctq- Then, by unitary invariance of the trace norm (cf. Section 
2.3.1), we will have 

In other words, the output states of a noiseless quantum system are distinguishable from 
one another exactly to the same extent as the corresponding input states. However, this 
is not the case for general (non-unitary) channels. Such channels are described by trace- 
preserving completely positive maps (cf. Ch. 2) and, for any such map T on density 
operators, we have ||T(p) —T{a)\\^ < ||p — cr||-,^ [- ^ ■]• A noisy quantum system can be 
modeled by replacing a unitary channel at each time step with a general completely positive 
trace-preserving map. In this case, we will have 

\\pn+l - CTn+lWi < \\Pn " 0-n|li , Vn G N, 

whence we see that, for the case of a noisy quantum system, the output states are generally 
less distinguishable from one another than the corresponding input states. Furthermore, 
distinguishability can only decrease with each time step. In other words, the distance 
between two disjoint orbits in the state space of the system will remain constant in the 
absence of noise, and shrink when noise is present. Both situations are depicted in Fig. 1.1. 

The discussion above suggests that, apart from insensitivity to small perturbations of 
states and operations, we should also pay attention to insensitivity to initial conditions, i.e., 
the situation where two markedly different input density operators will, over time, evolve 
into effectively indistinguishable output density operators due to the rapid shrinking of the 
distance between the corresponding orbits in the state space. One of the central goals of 
this dissertation is to investigate noisy channels with the property that any two orbits get 
uniformly exponentially close to each other with time. In a sense, this is the "worst" kind 
of noise because it renders the result of any sufficiently lengthy computation essentially 
useless, as it cannot be distinguished reliably from the result due to any other input state. 
Noisy channels with this property will be referred to as strictly contractive. 

Why do we choose to focus on this seemingly extreme noise model? First of all, as we 
will show later on, any noisy channel can be approximated arbitrarily closely by a strictly 
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contractive channel, such that the two cannot be distinguished by any experimental means. 
Essentially this implies that if a given noiseless channel is perturbed to a noisy one, we 
may as well assume that the latter is strictly contractive. Secondly, we wish to incorporate 
the finite-precision assumption into our mathematical framework. In particular, we want 
our model to be such that, in the presence of noise, there is always a nonzero probability 
of making an error when attempting to distinguish between any two quantum states, even 
when these states are, in principle, maximally distinguishable. As we will demonstrate, this 
desideratum is fulfilled by strictly contractive channels. Finally, the strictly contractive 
model provides a tool for investigations into the statistical dynamics of noisy quantum 
channels. In particular, the model already accommodates two important ingredients for 
a theory of approach to equilibrium, namely ergodicity and mixing (cf. [72] or [108, pp. 
54-60, 237-243]). 

Let us quickly recall these notions and outline the way they relate to noisy quan- 
tum systems. The content of the so-called ergodic hypothesis of statistical mechanics can 
be succinctly stated as the equivalence of statistical averages and time averages. Physi- 
cists usually take the pragmatic approach, assuming that the ergodic hypothesis holds in 
any physically meaningful situation (cf., e.g., [71, p. 4]). Rigorous proofs of ergodicity 
have been obtained only for very few cases (see, e.g., Sinai and Chernov [126]), none of 
which are particularly interesting. The sad fate of the ergodic programme in classical 
Hamiltonian mechanics had been sealed further by the famous Kolmogorov-Arnold-Moser 
(KAM) theorem [120, p. 155], which states that the majority of Hamiltonian evolutions 
do not satisfy the ergodic hypothesis.^ Quantum systems (spin systems in particular), 
however, still serve as fruitful soil for various investigations into ergodic theory [1, Ch. 7]. 
Discrete-time quantum channels are especially amenable to such studies; a general quan- 
tum channel T on density operators is termed ergodic^ if there exists a unique density 
operator px such that T{pt) = Pt- If {Pn} is an orbit generated by an ergodic channel T, 
then it can be shown that, for any observable A, the time average j^:^ J2n=o{ ^ )n, where 
{A)n '■= tr (Apn), converges to the fixed-point average {A)t-= tr (Apx) as N ^ oo. 

There exists also a stronger property, called mixing. In simple terms, a channel T is 
mixing if, for any observable A, we have {A)n ^ {A)t as n oo. Mixing obviously 
implies ergodicity, but the converse is not necessarily true. It turns out that strictly 
contractive channels are mixing, and hence ergodic. One of the most original thinkers on 
the subject of statistical physics, Nikolai Krylov, believed [72] that mixing, rather than 
ergodicity, should play central role in the theory of approach to equilibrium. In particular, 
he emphasized the importance of the so-called relaxation time, i.e., the time after which 
the system will be found, with very high probability, in a state very close to equilibrium. 
He showed that mixing, and not ergodicity, is necessary for obtaining correct estimates 
of the relaxation time. Qualitatively we can say that approach to equilibrium should be 



^Incidentally, it has recently been noted by Novikov [!12] that the results of Kolmogorov, Arnold, and 
Moser have not been fully proved. One can only wonder whether this will revive the research into the 
ergodic hypothesis for classical Hamiltonian systems. 

^An alternative (and, in many respects, more natural) definition of ergodicity can be formulated for 
transformations of observables, i.e., for the Heisenberg picture of quantum dynamics. 
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exponentially fast, as confirmed by experimental evidence, and this is precisely the feature 
that strictly contractive channels will be shown to possess. 

One of our central results is the following: errors modeled by strictly contractive chan- 
nels cannot be corrected perfectly. This result, while of a negative nature, does not come 
as a complete surprise: in a nonideal setting, impossibility of perfect error correction can 
only be expected. We will, however, present an argument that some form of "approximate" 
error correction will still be useful in many circumstances. In particular, we will discuss 
the possibility of either (a) going beyond the circuit model of quantum computation, or 
(b) finding ways to introduce enough parallelism into our quantum information processing 
so as to finish any job we need to do before the effect of errors becomes appreciable. 

In this respect we will mention an intriguing possibility of realizing quantum infor- 
mation processing in massively parallel arrays of interacting parcitles (quantum cellular 
automata [109]). One advantage furnished by such systems is the possibility of a phase 
transition, i.e., a marked change in macroscopic behavior that occurs when the values of 
suitable parameters cross some critical threshold. In the classical case, the stereotypi- 
cal example is provided by the two-dimensional Ising ferromagnet which, at sufficiently 
low temperatures, can "remember" the direction of an applied magnetic field even after 
the field is turned off. This phenomenon is, of course, at the basis of magnetic storage 
devices. The concept of a quantum phase transition [115] is tied to the ground-state be- 
havior of perturbed quantum spin systems on a lattice and refers to an abrupt change in 
the macroscopic nature of the ground state as the perturbation strength is varied. We 
will discuss this concept in greater detail later on; here we only mention that existence 
of a quantm phase transition can be exploited fruitfully for reliable storage of quantum 
information in the subcritical region at low temperatures (assuming that the ground state 
carries sufficient degeneracy, so as to accommodate the necessary amount of information). 

The dissertation is organized as follows. In Chapter 2 we give a quick introduction 
to the mathematical formalism of quantum information theory. Then, in Chapter 3, we 
discuss strictly contractive quantum channels. Chapter 4 is devoted to the the study 
of noisy quantum registers and computers in terms of the entropy-energy balance. In 
particular, we give an entropic interpretation of strict contractivity for bistochastic strictly 
contractive channels. In Chapter 5 we briefiy comment on the possibility of reliable storage 
of quantum information in spin systems on a lattice. Concluding remarks are given in 
Chapter 6. The necessary mathematical background is collected in Appendix A; Appendix 
B contains the list of symbols used throughout the dissertation. 



CHAPTER 2 



Basic notions of quantum 
information theory 



In this chapter we introduce the abstract formahsm of quantum information theory. But, 
before we proceed, it is pertinent to ask: what exactly is quantum information? Here is a 
definition taken from an excellent survey article of Werner [144]. 

Quantum information is that kind of information which 
is carried by quantum systems from the preparation 
device to the measuring apparatus in a quantum- 
mechanical experiment. 

Of course, this definition is somewhat vague about the general notion of "information," 
but we can take the pragmatic approach and say that the information about a given 
physical system includes the specification of the initial state of the system, as well as any 
other knowledge that can be used to predict the state of the system at some later time. 
Note that we are not talking about any quantitative measures of "information content." 
For this reason, such notions as channel capacity will be conspicuously absent form our 
presentation. For a lucid account of quantum channel capacity, the reader is referred to 
the surveys of Bennett and Shor [10] and Werner [144]. 

2.1 Classical systems vs. quantum systems 

Classical systems are distinguished from their quantum counterparts through such char- 
acteristics as size (macroscopic vs. microscopic) or the nature of their energy spectrum 
(continuous vs. discrete). For example, an electromagnetic pulse sent through an optical 
fiber can be thought of as classical, whereas a single photon sent through the fiber is re- 
garded as quantum. The most conspicuous differences, however, are revealed through the 
statistics of experiments performed upon these systems. For instance, the joint probability 
distribution of a number of classical random variables always has the form of a limit of 
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convex combinations of product probability distributions, but this is generally not so in 
the quantum case. 

In this section we introduce the mathematical formalism necessary for capturing the es- 
sential features of classical and quantum systems. Our exposition closely follows Werner's 
survey [1 1 1]. The requisite background on operator algebras is collected in Appendix A. 

2.1.1 Algebras of observables 

For each physical system we need an abstract description that would account not only for 
the classical/quantum distinction, but also for such features as the structure of the set of 
all possible configurations of the system. Such a description is possible through defining 
the algebra of observables of the system. In order to cover both classical and quantum 
systems, we will require from the outset that their algebras of observables be C*-algebras 
with identity. For the moment, we do not elaborate on the reasons for this choice, hoping 
that they will become clear as we go along. 

Anyone who has taken an introductory course in quantum mechanics knows that the 
presence of noncommuting observables is the most salient feature of the quantum formal- 
ism. Therefore we take for granted that the algebra of observables of a quantum system 
must be noncommutative, whereas the algebra of observables of a classical system must 
be commutative (abelian). Thus, without loss of generality, the algebra of observables of 
a quantum system is the algebra B{J^) of bounded operators on some Hilbert space Jif, 
whereas the corresponding algebra for a classical system is the algebra C{^) of continuous 
complex- valued functions on a compact set ^} 

Let us illustrate this high-level statement with some concrete examples. First we treat 
the simplest classical case, namely the classical bit. Here the set ^ is the two-element 
set {0,1}, and the corresponding algebra of observables is the set of all complex- valued 
functions on {0, 1}. We can think of an element of this algebra of observables as a random 
variable defined on the two-element sample space The simplest example of a quantum 
system, the quantum bit (or qubit) is furnished by considering a two-dimensional complex 
Hilbert space ~ C^, and the algebra of observables is nothing but the set 2 of 2 x 2 
complex matrices. 

In general, the structure of the configuration space of the system is reflected in the set 
(in the classical case) or the Hilbert space (in the quantum case). Thus a set ^ 
with \ = n would be associated to a classical system with an n-element configuration 
space; similarly, the underlying Hilbert space of a spin-S* quantum object would be (25' -|- 
l)-dimensional. We can also describe systems with countably infinite or uncountable 
configuration spaces, e.g., the classical Heisenberg spin with the set 3^ being (the 
unit sphere in M^), or a single mode of an electromagnetic field with the Hilbert space 
isomorphic to the space i"^ of square- summable infinite sequences of complex numbers. For 
simplicity let us suppose that, from now on, the algebras of observables with which we 

^We have allowed ourselves a simplification which consists in requiring that the set be compact; 
this is not the case for a general abelian C*-algebra, where the set can be merely a locally compact 
space, but, because we have assumed that any algebra of observables must have an identity, the set ^ 
will, in fact, be compact. 
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deal are finite-dimensionaL This imphes that, if we are deahng with the algebra C(^), 
then the set ^ is finite; similarly, given the algebra B{J^), the Hilbert space must be 
finite-dimensional . 

For the purposes of calculations it is often convenient to expand elements of an algebra 
in a basis. A canonical basis for C{^) is the set of functions ex,x G defined by 

so that any function / G C{^) can be expanded as / = J2x€.r f{^)^x- A basis for B{J^) 
is constructed by picking any orthonormal basis {ci} for and defining the "standard 
matrix units" Cij := \ei){ej\. Thus for any X G B{J^) we have X = with 
Xij G C. 

In order to describe composite systems, i.e., systems built up from several subsystems, 
we need a way of combining algebras to form new algebras. Let us consider bipartite 
systems first, starting with the classical case. Suppose we are given two classical systems. 
El and S2, with configuration spaces and '3^ respectively. Then the configuration of 
the joint system, Ei + S2, is characterized by giving an ordered pair {x G G '3^). 

Thus the configuration space of the joint system is simply the Cartesian product ^ x '3^, 
i.e., the set of all ordered pairs of the kind described above. The corresponding algebra of 
observables is x ^^), i.e., the algebra of functions f : ^ x '3^ — C. Any element / 

of this algebra can be written in the form 

/= J2 fi^^y^xy, (2.2) 

where the basis functions e^y are defined in the manner similar to Eq. (2.1). Furthermore, 
for any x' E ^ and y' E '3^ we have exy{x',y') = ex{x')ey{y'). On the other hand, a 
general element of the tensor product C{^) ® C{'3^) has the form 

/ = J2 f{x,y)ex ® ey. (2.3) 

Directly comparing Eqs. (2.2) and (2.3), we see that C(^) ® C(^) ^ C( x ^). 

In the quantum case we start by taking the tensor product of the Hilbert spaces of the 
subsystems. Consider two quantum systems with the Hilbert spaces and Jt^. Let {ci} 
and {e^} be orthonormal bases of and respectively. Then the set { ® e^} is the 
corresponding orthonormal basis of the tensor product space Jif (8) J^. A typical element 
of the algebra B{ ® J^) has the form 

and a typical element of the product algebra B{J^) ® B{J^) has a similar form. Thus 
we conclude that i3( ^ ® JT) ^ B{J^) ® i3(jr). 

In both the classical case and the quantum case we see that the algebra of observables of 
the bipartite system, whose subsystems are assigned algebras A and B, has the form A^B. 
Algebras of observables for multipartite systems can now be constructed inductively. Using 
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the tensor product, it is possible to define algebras of observables for hybrid systems, 
i.e., systems with both classical and quantum subsystems. This is not necessary for our 
purposes, and therefore we will not dwell on this. An interested reader is referred to 
Werner's survey [144] for details. 

2.1.2 Pure and mixed states 

Our next step is to describe the statistics of both classical and quantum systems in a unified 
fashion. This is accomplished by introducing states over the algebra of observables of the 
system. Recall that a state over a C*-algebra ^ is a positive normalized linear functional 
on A, i.e., a mapping u : A C that maps all positive elements of A to nonnegative 
real numbers, and for which we have uj{T) = 1, where I is the identity element of A. The 
number uj{A) then gives the expected value of the observable A measured on the system 
in the state u. 

The positive elements of a C*-algebra A are precisely those elements that can be 
written in the form B*B for some B & A. In the case of the algebra C(J?r), a function 
/ is a positive element if and only if f{x) > for all x & ^ or, equivalently, if and only 
if f{x) = \g{x)f for some g G C{^). In the case of B{Jf), an operator X is positive if 
and only if (iplXtp) > for all G or, equivalently, if and only if X = Y*Y for some 
Y G }3{J^). 

Of especial importance to the statistical framework of quantum information theory is 
the subset of A consisting of those elements F for which F > and I — F > (this is 
written as a double inequality < F < I). These observables are referred to as effects, 
the term introduced by Ludwig in his axiomatic treatment of quantum theory [85]. It 
is obvious that, for any effect F and any state cu, < uj{F) < 1. Furthermore, given 
a collection {F^} of effects with Y^a^a = I, we will have J2a^{^a) = 1- Thus, in the 
most general formulation, to each outcome o of an experiment performed on the system, 
classical or quantum, we associate an effect Fo, such that uj{Fo) is the probability of getting 
the outcome o when the system is in the state u. Obviously, J2o£0^i^o) = 1) where O is 
the set of all possible outcomes of the experiment. 

Having said this, let us first treat states in the classical setting. If u; is a state over 
the algebra C(^), then it is clear that < ^(e^;) < 1 for all x G . This follows 
from the fact that u;(I) = J2x^{^x) = 1 and from the positivity of uj. Thus we see that 
any state u over the algebra C(^) gives rise to a probability distribution {px} on 
where '■= uj{ex)- Conversely, given a probability distribution {px} on ^ , we can define 
a positive normalized linear functional on C{^) in an obvious way. Therefore there is a 
one-to-one correspondence between the states over C{^) and the probability distributions 
on We have argued this for the case of a finite in general, it is the content of the 
Riesz-Markov theorem [lOS, p. 107] that, given a compact Hausdorff space ^ , there exists 
a one-to-one correspondence between positive normalized linear functionals on C{^) and 
probability measures on ^ . 

Similarly, given a state uj over the algebra B{^) of a quantum system, we can associate 
with it a matrix p whose elements in the basis {cj} will have the form pij := uj{eji). Thus, 
given any X G B{J^), we will have uj{X) = J^i.j^ijPji = tr(pX). The matrix p is 
easily seen to have unit trace because = J^i^i^a) = J^iPa = trp, and is also 
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positive semidefinite because, for any ip G M', {iplp^j) = uj{\4'){4'\) > 0. Conversely, 
given a positive semidefinite matrix p of unit trace, we can define a state over B{Jif) via 
u!{eij) := tr(pejj) = pji. Thus we see that, when the Hilbert space J^f of the system 
has finite dimension n, there is a one-to-one correspondence between states over B{Jif) 
and positive semidefinite n x n matrices of unit trace (called density matrices or density 
operators). This is not true in the case when is infinite-dimensional: not every state u 
over B{J^) corresponds to a density operator. Those states that do have density operators 
associated with them are called normal states. 

In light of the correspondence between states and probability measures (in the classical 
case) or density operators (in the quantum case), we will use the term "state" interchange- 
ably, referring either to the functional on the corresponding algebra of observables, or to 
the corresponding probability measure or the density operator. 

The set S{A) of states over a C*-algebra ^ is a convex set whose extreme points 
are referred to as pure states. The adjective "pure" reflects the fact that these are the 
states with the least amount of "randomness:" being extreme points of the set S{A), they 
cannot be written as nontrivial convex combinations of other states. For this reason the 
pure states over an algebra of observables play a crucial role. In order to characterize 
the pure states over the algebra C{J^), we invoke the fact that a state u over an abelian 
C*-algebra A is pure if and only if u{AB) = u!{A)uj{B) for all A,B & A, as well as the 
fact that a state over C{^) is determined by its action on the basis functions e^. Because 

= {gxY, which means that e^ix') = Cxix'Y for any x' G X , we have oj{e^ = u^e^Y 
for u pure, which implies that Lj{ex) G {0,1} for each x G Since J2x^{^x) = 1, we 
conclude that, for each pure state u over C{^), there exists a unique y & ^ such that 



This pure state corresponds to the probability measure Sy concentrated on the single 
point y ^ Such measures are referred to as point measures. Conversely, defining 
the state Uy corresponding to the point measure 6y, we can easily convince ourselves that 
ujy{fg) = ujy{f)uy{g) for all pairs f,g E C{^). Thus the pure states over the algebra 
C{^) are in a one-to-one correspondence with the point measures over ^ . 

As for quantum systems, we know that there is a one-to-one correspondence between 
the set of states over B{M') and the convex set 5(J^) of the density operators on 
(again, we assume that the Hilbert space ^ is finite-dimensional). Furthermore this cor- 
respondence is affine, i.e., convex combinations of states over B{Jif) correspond to convex 
combinations of density operators on J^. Hence there is a one-to-one correspondence 
between the extreme points of the respective sets. The extreme points of S{J^) are the 
one-dimensional projectors, i.e., those density matrices p for which p^ = p. Thus the pure 
states over B{J^) correspond precisely to the one-dimensional projectors in B{J^) (or, 
equivalenty, to the unit vectors in J^). 

States that are not pure are referred to as mixed; they correspond to non-extreme 
points of the corresponding state spaces. According to the Krein-Milman theorem [118, 
p. 67], any point of a compact convex set 5* in a locally convex topological space is a 
limit of convex combinations of the extreme points of S. In fact, a stronger result due 
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to Caratheodory [100, p. 7] states that any point in a compact convex subset S of an n- 
dimensional space is a convex combination of at most n + 1 extreme points of S. Therefore 
any mixed state over C{^) can be represented as a convex mixture of point measures on 

, while any mixed state over B{,^) is a convex mixture of one-dimensional projections 
on In either case the operation of forming a convex combination of pure states can 
be thought of as introducing "classical" randomness. In this respect an important role is 
played by the so-called maximally mixed states, i.e., those states that are "most random." 
The maximally mixed state over the classical algebra of observables C{^) corresponds to 
the normalized counting measure on ^ , i.e., to the measure that assigns the value 1/ \ ^\ 
to each x G The maximally mixed state over B{J^) corresponds to the normalized 
identity matrix, I/dim The reason for the name "maximally mixed" will become 
apparent when we discuss entropy in Sec. 4.1. 

States of composite systems are defined by means of the tensor product construction. 
In other words, a state of the system with the algebra of observables A ® B is a. positive 
normalized linear functional over A ® B. Again, any state over A ® B will be a convex 
combination of pure states. In the classical case, A = C{^) and B = C(^), pure states 
correspond to the point measures = 5x ® Thus any state over C{^) ® C{'3^) 
has the form 

^ = J2P^y^^ ® ^y^ < p^j, < l,^p^y = 1 

x,y x,y 

i.e., it can be written as a convex combination of product measures. This is not so for the 
states over B{Jf) ® i3(J^), where Jif and ^ are Hilbert spaces. Now the pure states 
correspond to unit vectors in ® J^, and it is a basic fact of the theory of tensor 
products that not every vector in Jif ^ can be written in the product form %p ® (j) 
with -0 e and G . Consequently, not all states of a composite quantum system 
are separable in the following sense. 

Definition 2.1.1 A state u of a composite system with the algebra of observables A ^ B 
is called separable (or classically correlated in the terminology of Werner [143]) if it can 
be written as 

uj = Y,p,u;f oof, (2.4) 

i 

where uf G S{A) and uof G S{B), with nontrivial weights pi. Otherwise uj is called 
entangled. 

On the contrary, every state of a composite classical system is separable, as we have seen 
above. This conclusion also follows for very general cases from the observation that every 
such state is a convex combination of point measures, but, because the point measures on 
a Cartesian product of sets are precisely the product measures [121, p. 32], the state is a 
convex combination of product measures and hence separable. 

There are many interesting examples of entangled states. In the case of ® M' , 
where ^ C^, we can give an example of a family of entangled states whose state vectors 
also form an orthonormal basis of M' ® J^. 
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Example 2.1.2 (the Bell basis) Let |ei) and |e2) be an orthonormal basis of C^. Then 
the pure states, whose vectors form the so-called Bell basis, 



are entangled states. 
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The theory of entanglement is a rich subfield of quantum information theory, but, since 
we are not directly concerned with entanglement in this work, we will limit ourselves to 
the very basic facts. The reader is encouraged to consult the survey article by M., P., and 
R. Horodecki [61] for further details. 

Given an arbitrary state p, it is in general not an easy task to decide whether it is 
entangled unless it is pure, in which case our job reduces to the analysis of the so-called 
Schmidt decomposition of the corresponding state vector. In order to define the Schmidt 
decomposition, we first need to look at the restriction of states to subsystems. 

Definition 2.1.3 Let u be a state over the algebra of observables A ® B. Then the 
restriction of to ^ is the unique state uj^ determined by uja{^) '■= uj{A ® Ig) for any 

AeA. 

The number u{A^ should be thought of as the expected value of the observable A 
which we measure on the subsystem with the algebra A, completely ignoring the subsystem 
with the algebra B. 

In the classical case, where A ^ B = C{^) ® Ci^^), observables of the form A (g) I 
can be written as 

A ^ 1 = ^A(x)e^ ® Cy, 

x,y 

SO that the restriction to A of the state corresponding to the probability measure pxy 
on ^ X W is the state corresponding to the probability measure = J2yPxy, i-e., 
^A{f) = J2x,yPxyf{x). This is precisely the marginal probabihty distribution obtained 
by integrating over the set '3^. It is easy to see that any pure state of a bipartite classical 
system restricts to a pure state on either subsystem. 

In the case of a quantum system, the restriction of a state p over A^ B = B{ (g) J^) 
to A is determined by tr (paA) = tr [p( A ® I,x)], i-e., the corresponding density operator 
p_A is obtained by taking the partial trace of p over p_4 = tr^^p. Contrary to the 
classical case, pure states over B{Jif ® J^) that are not elementary tensors (i.e., are 
not of the form (p) do not restrict to pure states over Jif or over J^. Indeed, the 

restriction to B{J^) of any the pure states defined in Eqs. (2.5)-(2.8) is the maximally 
mixed state (1/2)1. 
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Let A and B be algebras of observables. Given the restrictions p_4 and pg, it is generally 
impossible to reconstruct the state p over A B with these restrictions unless it is known 
a priori that p is pure. In this case we have the following theorem. 

Theorem 2.1.4 (Schmidt decomposition) Let ip G Jif ® be a unit vector, and let 
Pj^' be the restriction of the state to the first system. Let p,^ = J2i Qil^i) be the 

spectral decomposition of p,^' with qi > 0. Then there exists an orthonormal system {fi} 
in such that 

V^ = Ev^e, ®/,. (2.9) 

i 

Furthermore, the state is entangled if its Schmidt decomposition (2.9) has two or 

more terms. The number of terms is referred to as the Schmidt number ofip. 

Proof: By definition of the restricted state, we have 

trp^A ={^\{A® 

where A e B{J^) is an arbitrary operator. Writing = J2i ^ "^i: where Vi G are 
not normalized, we obtain 

trp.^'yl = '^{cilAej) {vi\vj) . 

We let A = \em){en\ to get qmSmn = {vm\vn)- Defining fi := {l/^)vi, we obtain ip = 
J2i ® fi: which proves Eq. (2.9). 

Now suppose that ■j/' is a product state. Then the restriction of \'ip){'ip\ to the first 
system is a one-dimensional projection, and hence has only one nonzero eigenvalue, which 
means that the Schmidt decomposition of has only one term. ■ 

The Schmidt decomposition can also work "in reverse," as follows from the following 
theorem. 

Theorem 2.1.5 (purification) Let M' be a Hilbert space. For any state p G S{J^) there 
exist a Hilbert space and a pure state ip G J€' ® ^ , called the purification of p, such 
that p = tr^ !'?/')('?/' I . Furthermore, the restriction ti jif\ip) can be chosen to have no 
zero eigenvalues, in which case the space and the vector tp are unique up to a unitary 
transformation. 

Proof: Let p = Y.i=i Qi\^i) be the spectral decomposition of p with > 0. Choose 
isomorphic to C^, and let be an orthonormal basis for J^. Then the vector 

tp := Z^iLi ^/Oi^i ® fi is the desired purification. Since the number k and the vectors Cj 
are uniquely determined by p, the only freedom in this construction is the orthonormal 
basis {fi}, but any two such bases are connected by a unitary transformation. ■ 

With the aid of the Schmidt decomposition, we see that a pure state over A ® B is 
separable if and only if it restricts to pure states over both subsystems. (Actually, Theorem 
2.1.4 implies the "only if part; the "if part is trivial.) The diametrical opposite of this 
situation is described in the following definition. 
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Definition 2.1.6 A pure state of a bipartite system is called maximally entangled if it 
restricts to maximally mixed states on either subsystem. 

For instance, the states forming the Bell basis are all maximally entangled. We will 
come back to the subject of maximally entangled states in the next section. Here we 
only mention that maximally entangled states are a crucial resource in virtually every 
quantum communication scheme and cryptographic protocol; see the survey by Weinfurter 
and Zeilinger [142] for details. 



After having introduced algebras of observables and states of classical and quantum sys- 
tems, we must provide the mathematical description of any processing performed on these 
systems. This is done by means of the so-called channels. From now on, we will assume 
that all systems under consideration are quantum systems, unless specified otherwise. 

2.2.1 Definitions 

Let us consider the following situation. Suppose that, after some processing on the system 
with the algebra of observables A, the result is a system with the corresponding algebra 
B. On this "new" system, we measure an effect F E B. However, we can also view this 
sequence of actions as the measurement of some effect F E A on the "old" system. Thus 
the processing step can be thought of as a transformation T that takes effects in B to 
effects in .4, -F = T{F) or, in general, as a mapping T : B ^ A that takes observables in 
B to observables in A. Alternatively, we can view the processing step as a transformation 
T^, that takes states over A to states over B. Obviously, these two interpretations of the 
processing step must be equivalent in the statistical sense, so we require that, for any state 
uj over A and for any observable X in B, 



which expresses the statement that the expectation values for the outcome of any mea- 
surement must be the same for T and for T*. Sometimes we will use the composition 
notation cij o T to denote the state defined by {u o T){X) := {T^{u!)){X). 

Already from this simple description we can glean the properties required of the map 
T. First of all, T must map effects to effects, which implies that T must be a positive 
map, i.e., X > must imply T{X) > 0. Secondly, the trivial measurement corresponding 
to the effect must be mapped to the trivial measurement I^, T{1q) = I_4. These two 
requirements can be summarized by saying that T must be positive and unital (or unit- 
preserving). Furthermore, if u; is a state, then by hypothesis T^,{uj) is a state also. Hence 
the left-hand side of Eq. (2.10) is linear in X, which means that the right-hand side must 
also be linear in X. Thus T must be a linear positive unital map B A. 

The dual map T^, on states can also be viewed as a map that takes density operators 
in A to density operators in B, which allows us to rewrite Eq. (2.10) as 
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{T4u;)){X)=u;iT{X)), 



(2.10) 



tr [T.(p)X]=tr [pT(X)]. 



(2.11) 
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Since T is unit-preserving, the hnear map must be trace-preserving, tr T*(p) = tr p [just 
substitute I for X in Eq. (2.11)], and positive, so that density operators are mapped to 
density operators. 

Mere positivity of the maps T and T^., however, is not sufficient. In many situations we 
need to consider parallel processing performed on quantum systems, i.e., transformations 
of the form S ®T : Bi <^ B2 ^ Ai <^ A2, where Ai,A2,Bi, B2 are algebras of observables. 
In order to represent a physically meaningful processing step, the map S ® T must be a 
linear positive unital map. However the tensor product of positive maps may fail to be 
positive, as follows from the following standard example [21, p. 192]. 

Example 2.2.1 (the transposition map) Let the algebra A be the space M.d of d x d 
complex matrices. Matrices in Aid act as operators on the Hilbert space ^ C''. Let 
{ej}'j^i be an orthonormal basis of Jif. Consider the transposition map Q : A ^ A, that 
is, the map that sends \ej){ek\ to \ek){ej\. Since 9 leaves each \ej){ej\ invariant, it is trace- 
preserving and positive [given a positive operator X, write its spectral decomposition to 
see that Q{X) is also positive]. Let us form the map 9 ® id on ® -Md, where id is 
the identity map, in which case we have 

9 ® id : I ® ek){ei (S> em\^\ei ® ek){ej ® 6^1- 

Now consider the operator 

d 

j,k=i 

which is clearly positive. Then 

d 

F := Q (g) id{A) = ^ | ® ej){ej O e^l 
j,k=i 

is the so-called flip operator on (g C^, that is, for any pair & C^, F^ip ® (p) = 
4> ® if). The flip operator is manifestly not positive because, for the antisymmetric vector 
^ = ijj ^ (j) — (p ^ we see that F'^ = — Hence the operator F has a negative 
eigenvalue, and therefore cannot be positive. □ 

The above example shows that, even if a map T is positive, the map T ® id may 
already fail to be positive, which in turn shows that tensor products of positive maps do 
not have to be positive maps. This is clearly unacceptable for the mathematical model of 
a channel. A good way out of this difficulty is to restrict the class of admissible maps to 
include only the so-called completely positive maps [97, p. 25]. 

Definition 2.2.2 Let T : A ^ B be a map between operator algebras. Define the map 
Tn : A ^ Ain —^B® M.n via Tn '■= T ® id. Then T is called n-positive if Tn is a 
positive map. A map that is n-positive for all values of n is termed completely positive. 

Now suppose that S : Bi ^ Ai and T : B2 ^ A2 are completely positive maps. Let m 
and n be the dimensions of the Hilbert spaces and J^, where B2 and Ai are subalgebras 
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of B{J^) and B{J(^) respectively. Then the maps S ® id^ : Bi tSi B2 ^ Ai ^ B2 and 
idn ® T : Ai ® B2 ^ Ai A2 are positive. Hence their composition, S ® T, is 
well-defined and positive. 




Ai ® B2 



This observation, pictured on the diagram above, motivates the following definition. 

Definition 2.2.3 A channel converting systems with the algebra of observables A into 
systems with the algebra of observables B is a completely positive unital linear map T : 
B ^ A. The dual map T^, related to T via Eq. (2.10) is then a completely positive 
trace-preserving linear map, and is referred to as the dual channel. 

Remark: We have been somewhat cavalier in our definition of the dual channel T^, 
acting on states through the channel T acting on observables, having ignored certain 
technicalities that arise when the Hilbert space ^ is infinite-dimensional. These 
complications disappear in the finite-dimensional case, so we will not dwell on this point 
any further. □ 

We say that the channel T corresponds to the Heisenberg picture of quantum dynam- 
ics, whereas the dual channel describes the Schrddinger picture. This generalizes the 
notions of the Heisenberg and the Schrodinger pictures, studied in introductory courses 
on quantum mechanics. 



2.2.2 Examples 

It turns out that all physically meaningful examples of channels can be constructed by 
putting together certain basic building blocks. We will get to this issue in a moment, but 
first we will provide several examples of completely positive maps in general, and channels 
in particular. These examples can be found in Werner's survey [1 11], but here we fill in 
the missing details. 

Example 2.2.4 (*-homomorphisms) Let A and B be C*-algebras, and consider a *- 
homomorphism n : A ^ B. We know that *-homomorphisms map positive elements to 
positive elements, hence vr is a positive map. Let us consider the map vr ® id„ that maps 
A ® A4n to i3 ® Ain- The tensor product A ® Mn is isomorphic to the algebra M.n{A) 
oi n X n matrices with ^-valued entries; this follows from noting that any element of 
A <^ ^An can be written in the form J2i,j=i Aij ® Cij, where Aij G A and Cij is the matrix 
unit with entries 6ij. Thus it is natural to identify the element Aij G A with the (z,j)th 
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entry of an n x n .4-valued matrix. The product of elements in .4 ® Ain is given by 




where the {i,j)th entry is given by the usual laws of matrix multiplication, but with 
elements of A instead of complex numbers. The other operations are defined similarly. 
Furthermore, the action of the map vr ® id„ on an element of 7W„(v4) amounts to the en- 
trywise application of the *-homomorphism it. It is an easy task to show that the resulting 
map is also a *-homomorphism, and hence positive. This shows that *-homomorphisms 
between C*-algebras are completely positive. □ 

Example 2.2.5 (conjugations) Let and be Hilbert spaces, and let V : — > 
be a bounded operator. Then the map T : B{J^) B{Jf) defined by T(X) = VXV* 
is completely positive. First of all, T is obviously positive. Indeed, given X > 0, there 
exists Y such that X = Y*Y, which implies that T{X) = VY*YV* = {YV*y{YV*) > 0. 
Now, if X = Y*Y is a positive element of B{J^) ® Ain, then similarly T ® id„(X) = 
{V O 1)X{V* ® I) = {Y{V* ® 1))*{Y{V* O I)) > 0. This holds for all hence T 
is completely positive. This example covers the special case of unitary conjugations, i.e., 
the case when ^ is a unitary operator. Because VV* = V*V = 1 for a unitary V, the 
corresponding conjugation is also a channel. □ 

Example 2.2.6 (restriction) Let A and B be algebras, and consider the map Mq : A 
A ® B defined by Mg(A) = A ® 1. This map is clearly completely positive and unital. 
Let us pass to the Schrodinger picture, where we expect that the dual channel Mg* is the 
operation of taking the partial trace over the second system. Indeed, consider a density 
operator 

In the duality relation (2.11), let X be the matrix unit \eg){ep\. Then we obtain 



MA" 



which is precisely the [p, q)th matrix element of the partial trace of p over the second 
system. Thus Mg* = trg. □ 

Example 2.2.7 (expansion) A common operation in quantum information theory is, 
given a system in some state p, to adjoin an auxiliary system in some fixed state po- In 
the Schrodinger picture, this operation is a channel, and has the form T*(p) = p ® Pq. Let 
us determine the corresponding channel in the Heisenberg picture. Let the two systems 
have A and B respectively as their algebras of observables. The sought channel is a 
map from A ® B to A. Because any X E A ® B can be written in the form X = 
J2i Ai ® where Ai E A and Bi G i3, the action of the channel T is determined by 
its effect on the elementary tensors A ® B. From the duality relation (2.11), we have 
ti[[p ® Po){A ® B)] = tr [pT{A ® B)], which can be rewritten as tr (pA)tr (po-B) = 
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tr [pT{A ® B)]. This must hold for an arbitrary density operator p, which imphes that 
T{A B) = [tr {poB)]A. The action of T can be extended to the whole of A <^ B hj 
linearity. Complete positivity follows from the fact that T^, is completely positive, and 
therefore so is its dual map T. □ 

Example 2.2.8 (measurement) A measurement can be thought of as a channel that 
converts quantum systems into classical systems. Let ^ be the set of the measurement 
outcomes. Then the act of measurement can be represented by a mapping T : C{^) A, 
where A is the algebra of observables of the quantum system. The channel T is obviously 
determined by the operators Fr^ := T{ex),x G It is a basic result in the theory of 
completely positive maps that any positive map T : C(^) — ^ A, where ^ is a compact 
set and A is an operator algebra, is automatically completely positive [21, p. 192]. Thus 
we must have > 0. Furthermore, because T must be unital, the operators F^, must 
form a resolution of identity on A, i.e., J^x^x = I- The apphcation of to a density 
operator p yields a function /(x) = tr (pF^), i.e., the probability of obtaining the outcome 
X when the system is in the state p. The collection {F^} with F^ > and J2x^x = I 
is an example of a positive operator-valued measure (POVM). We will discuss POVM's 
in greater detail in Sec. 2.3.3, when we talk about quantum detection theory. The "old- 
school" projective (von Neumann-Liiders) measurement obtains when the effects F^ have 
the property F^Fy = S^yF^. □ 

Example 2.2.9 (irreversible quantum dynamics) In Example 2.2.5, we have consid- 
ered the case of unitarily implemented channels. Such channels arise whenever we talk 
about reversible quantum dynamics. A general theory of irreversible quantum dynaimcs 
proceeds as follows [31]. The system, initially in some state p G B{Jif), is brought into con- 
tact with another system, the reservoir, initially in some fixed state pji G B{J^), where 
is the Hilbert space of the reservoir. The combined "system -|- reservoir" entity is assumed 
to be closed. Then the two are caused to interact by means of a unitarily implemented 
channel, and the final state of the system is obtained by tracing out the reservoir degrees 
of freedom. In the Schrodinger picture, this irreversible evolution of the system is given 
by the channel T*(p) = tijtUip ® Pr)U*. □ 

Finally we give one more example, which has nothing to do with quantum informa- 
tion theory per se, but rather serves to demonstrate the all-encompassing nature of the 
definition of the channel. 

Example 2.2.10 (classical channel) A classical channel is, roughly speaking, a trans- 
formation that converts classical systems into classical systems. Hence a positive map 
T : C{^) C{'3^) is a classical channel, which is uniquely determined by the func- 
tions C{'3^) 3 fx '■= T{ex). The dual map converts states over C{'3^) into states 
over C{I^) or, equivalently, probability measures on '3^ into probability measures on ^ . 
Specifically, we can expand fx = J2y fxy^y, so that, for any function g G C{^), we have 
T{g) = J2x,y g{^)fxyey If p = {py} a probability measure on '3^, the duality relation (2.10) 
says that 

j:pyin9))iy)=j:iup))x9ix), 

y X 
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from which we get, upon expanding, 



Comparing coefficients, we obtain {T^{p)).j. = Y,y fxyPy The positive numbers f^y form the 
transition matrix of the channel, where f^y is the conditional probability p{x\y) that the 
symbol x is received given that the symbol y was transmitted. Because T is a channel, it is 
unital, i.e., T(I) = J2xT{e^) = 1 = J^y^y. But J2x fx = J2x,y fxyGy, so we see, comparing 
coefficients, that J2x fxy = 1, i-e., the columns of the transition matrix add up to one. □ 

2.2.3 The theorems of Stinespring and Kraus 

Up to this point, our treatment of channels has been largely axiomatic. However, we 
can adopt the pragmatic point of view and demand that only those transformations that 
can be built up from certain basic blocks can serve as channels. We take our cue from 
quantum theory of open systems [31] and say that any "physically acceptable" channel 
can be realized as a sequence of the following steps: (a) adjunction of an auxiliary system 
(called the ancillc? in the terminology of Helstrom [5S]) in some fixed initial state, (b) 
unitarily implemented evolution of the enlarged system, and (c) restriction to the original 
subsystem. In other words, any channel must be of the form described in Example 2.2.9. 
Luckily it turns out that the two descriptions coincide; this is ultimately a consequence 
of the Stinespring theorem [129] which we ffist state, without proof, in the form given by 
Paulsen [97, p. 43]. 

Theorem 2.2.11 (Stinespring) Let A he a C*-algehra with identity, and let he a 
Hilhert space. Then a linear map T : A ^ B{J^) is completely positive if and only if 
there exist a Hilhert space J€ , a unital *-homomorphism n : A ^ B{J{f), and a hounded 
operator V : M' ^ with \\Vf = ||T(I)|| such that 



for any A & A. We will refer either to Eq. (2.12) or to the triple ( J^, V^, vr) as the 
Stinespring decomposition of T . 

It immediately follows from the Stinespring theorem that if T is also a unital map, then 

V is an isometry, i.e., V*V = I^r- The Stinespring theorem has a useful specialization 
[37, p. 15], [13 1, p. 222] to the case when the algebra A is an algebra of operators in a 
Hilbert space. 

Theorem 2.2.12 (Stinespring; the Hilbert-space version) Let and Mi he Hilhert 
spaces, and let T : B{J^i) — >• E{M') he a completely positive map with the Stinespring 
decomposition ( J^, vr) . Then there exist a Hilhert space and a unitary operator 

V : X ^ M{® M2 such that, for any A E B{J^i), 



T{A) = V*7r{A)V. 



(2.12) 



T{A) = V*U*{A ® 1m)UV. 



(2.13) 



^Latin for "housemaid;" we choose not to dwell on the philosophical implications of this! 
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We can absorb the unitary U and the mapping V into a single mapping to obtain the 
following corollary. 

Corollary 2.2.13 Let and he Hilbert spaces, and let T : B{je) B{J^) he a 
completely positive map. Then there exist a Hilbert space S and a bounded map V : —>■ 
® <S' such that 

T{A) =V*{A 1^)V (2.14) 
for all A G B{Jif) . Furthermore, if T is unital, then V is an isometry. 

The following result [71], which carries a great deal of significance in quantum infor- 
mation theory, is a consequence of the Stinespring theorem. We provide the proof because 
it is instructive, and because we will come to rely on some of the techniques used in it. 

Theorem 2.2.14 (the Kraus representation) Let and be Hilbert spaces, and 
let T : B{J^) — > B{J(f) he a completely positive map. Then there exist bounded operators 
Va : ^ such that 

T{A) = Y^V:AV^ (2.15) 

a 

for all A G B{Jif), where the sum in Eq. (2.15) converges in the strong operator topology. 
Furthermore, if the map T is unital, then I]q, V^Vq, = I.^. The collection of operators 
{Va} will he referred to as the Kraus decomposition ofT. 

Proof: Let S' and V be given by Eq. (2.14). Now let {^a} be an orthonormal basis of 
(S . Then, given any ip G J(f , we can expand 

Vij = Y. Va^ ® U (2.16) 

a 

where Va '■ — > are some operators. Let x be an arbitrary vector in ^ . Then the 
action of the adjoint V* on elementary tensors ip ® (p & ® S can be read off from 

kx\v\i^ ® 0)) = (v^xl^ ® 0) = E(^«x 0) = E(xlK:v^)(ea|0), 

a a 

which yields 

v\i^®<\>) = Y.^io\m:^- (2.17) 

Now let "ip be an arbitrary vector in J^. For an arbitrary operator A G i3(t^), we 
write 



T{A)^ = V\A® t)V^ 



V\A ® I)E Vai) ®io. = V* (e ^KV^ ® 

T.{ip\ic.)v;AVa^iJ = Y.v:AVa^p^ 

a,l3 » 



which is Eq. (2.15). Now if T is unital, then V is an isometry, which implies the 
normalization condition J2a V*Va = Xr- ^ 
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Given a completely positve map T, its Kraus decomposition is obviously not unique. 
As shown in the proof of Theorem 2.2.14, the operators Va are determined by the map 
V and by the orthonormal basis {^a} of S". Thus we have the freedom of choosing the 
basis of S'; let {rja} be some other basis, and let U he a. unitary transformation such that 
U^a = Va- Then, for any ip G J^, we can expand V^V as 

« a a, 13,"/ 

= ^/3" ® = K.^A ® ^a, 

a,/3 a 

where Va := Y^pUapWp, and it is clear that both sets {V^} and {IVq,} form Kraus decom- 
positions of T. 

Now, if T : B{J^) B{J{f) is a channel, then the dual map T^, transforms density 
operators on J(f to density operators on ^ . Let {V^} be the Kraus decomposition of T. 
Then the duality relation (2.11) implies that, for any density operator p on J(f , we have 

TM=T.'^apV:. (2.18) 

a 

It follows from Eq. (2.18) that the dual channel T* can be extended to all trace-class 
operators on because any trace-class operator can be written as a complex linear 
combination of four density operators. 

Finally, after all these tedious preparations, we are ready to state and prove the result, 
due to Kraus [71], that any channel can be represented in the ancilla form. 

Theorem 2.2.15 (ancilla form) Let T : B{J^) B{J^) be a channel. Then there 
exist Hilbert spaces ^ and W , a unit vector Q & W , and a unitary transformation U : 
(g) — > M' ® ^ such that, for any density operator p on , 

T,{p) = ii.^u{p ® \n){n\)u*. (2.19) 

Proof: Let V and S" be given by Eq. (2.14), and let and ^ be Hilbert spaces such that 
J(f ® ^ ~ (g) ^ and (Urns' < dim^. Now pick a unit vector Vt E and consider 
the map 

1J{iIj ® ^) ■=V'iIj (2.20) 

for all ip £ ■ The vector on the right-hand side of Eq. (2.20) is an element of ^ ® S, 
hence an element of ® <^ because S is, by construction, isomorphic to a subspace 
of Now if {cj} is an orthonormal basis of J^, then the vectors U{ei ® Q) form an 
orthonormal system in ® ^ because 

{Uid ® n)\U{ej ® n)) = {Ve,\Vej) = {ei\V*Vej) = {dlej) = 6ij, 

where we have used the fact that T is a channel, and therefore V is an isometry. Hence U 
can be extended to a unitary map U : ® 5^ — > ® Furthermore, because ff is 
isomorphic to a subspace of we can express the action of U on the vectors of the form 
ip (S) fl using a Kraus decomposition {Vq,} of T as 
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where {^a} is an orthonormal basis of S", determined by {Vq,} (cf. the proof of Theo- 
rem 2.2.14). Then, for any ip G J^, we have 

tr^[/(|^/;)(^| ® \n){n\)U* = tr,^^ VM{i:\V; ® l^i^f^l 

a,l3 

a 

and the theorem is proved. ■ 

Remark: When the Hilbert spaces Jif and are isomorphic, the statement of the 
theorem simphfies to the following. There exist a Hilbert space S', a unit vector Q & S', 
and a unitary U : J(f ® <g ^ ^ ® <g such that T,(p) = tr^f/( p ® for all 

p e iS(^), where S is determined by Eq. (2.14). □ 



2.2.4 Duality between channels and bipartite states 

There exists a correspondence between channels T : B{J^) — > i3(J^) and states over 
B{ ® J^) which, in many situations, is more convenient than the Kraus representation 
or the ancilla form. 

First we make the following observation. Let M' and be Hilbert spaces, and let 
A : —>■ Jif be an operator which we write as 

A = Y.AM-)e,, (2.21) 

i,fi 

where {ej} and {/^} are orthonormal bases of and respectively. Using the Dirac 
notation, Eq. (2.21) can be rewritten as A = J2i,^l^i^l\^^){ffJ.\■ We can view the matrix 
elements of A as the coefficients, in the basis {| ® f^j)}, of a vector in (g) 
which we denote by \A]), 

\A)) ■.= Y,A,\ei ® f,). (2.22) 

The double-ket notation in Eq. (2.22) is due to Royer [111]. We must caution the reader 
that, although the only object appearing inside the double ket is the operator A, attention 
must be paid to the choice of basis for the tensor product of the corresponding Hilbert 
spaces. 

The correspondence A i— > \A)) yields a number of useful formulas, which we summarize 
in the following lemma [28, 111]. We omit the proof which consists in routine, but tedious, 
manipulations with indices. 

Lemma 2.2.16 Let and be Hilbert spaces. Then we have the following relations 
for vectors in ® ^ : 

iA\B}, = trA*S (2.23) 
{A®B)\C)) = \ACB'^)) (2.24) 
tT.jr\A)){{B\ = AB* (2.25) 
tTMA)){{B\ = A'^B, (2.26) 
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where B denotes the matrix transpose of B, and B denotes the operator whose matrix 
elements are obtained by taking the complex conjugates of the matrix elements of B. 

Remark: Once again, we point out that the relations stated in Lemma 2.2.16 are valid 
as long as the matrix elements of the operators A, B, and C refer to the same choice of 
bases for Jif and □ 

Before proceeding to our main topic, we give a couple of examples, due to D'Ariano, 
Lo Presti, and Sacchi [28], that illustrate the power of this approach. 

Example 2.2.17 (maximally entangled states) Let \E' be a pure state in ® 
where J^a and J^b are Hilbert spaces of the same (finite) dimension A^. We claim that \E' 
is maximally entangled if and only if it can be written in the form (l/\/iV)|f/)) for some 
unitary U : J^b '^A- Assume first that \i/ = {1/^/N)\U)) for a unitary U : M'b '^a- 
Then, using Eqs. (2.25) and (2.26), we see that the restrictions of |^E')(^'| to A and to B 
are given by 

trs|^)(^| = (l/Ar)trB|f/))((f/| = (l/Ar)f/f/* = (l/Ar)I^ 

trA|*)(*| = {\lN)iiA\U))m = {l/N)U^U = {l/N){U*Uy = (1/A^)Ib, 

which shows that ^ is maximally entangled. On the other hand, suppose \i/ is maximally 
entangled. Let \i/ = |M)), where M : M'b — ^ is some operator. We have 

trB|^)(*| = trB|M))((M| = MM* = {1/N)1a 

tr^|*)(^| = tTA\M)){{M\ = M^M = {M*My = (l/A^)Is, 

which would hold if and only if M = (1/^/N)U for some unitary U. □ 

Example 2.2.18 (the Schmidt decomposition) Let \A)) E J^a ® be a pure state. 
Write down the polar decomposition oi A, A = V\/A*A, where V is unitary, and choose 
a unitary operator U such that UA*AU* is diagonal. Then 

\A)) = \Vv^)) = {VU* ® U^)\U^f^U*)) = E V^e, ® 



where {Aj, ■j/'j} are the eigenvalues and the eigenvectors of y/A*A, and we have defined the 
vectors Cj := VU*ipi, fi := U~^ipi. □ 

The matrix approach described above reveals its true strength in the following charac- 
terization of channels due to D'Ariano and Lo Presti [26]. Let T : B{J^) B{^) be a 
completely positive map, with the corresponding dual map : S{Jt) — > S{.j^). Let {ci} 
be an orthonormal basis of J(f , so that G .y^f ® is the unnormalized maximally 
entangled state ® ^i- Define on .J^ ® the positive operator 

Rt:={T, ® id)(|I))((I|) (2.27) 

(the positivity of Rt follows from the complete positivity of T^). Then the action of on 
an arbitrary p G S{J(f) can be given in terms of Rt as 

(2.28) 



T*(p) = tr 



p')R 



■T 
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where the transpose operation is performed with respect to the basis {cj}. Here is one 
way to prove Eq. (2.28). Pick a Kraus decomposition {Vq,} of T. Then, using Eq. (2.24), 
we get 

Rr = J2iV^ ® I)|I))PI(V: ® l) = J2K))m. 

a a 

Substituting this into the right-hand side of Eq. (2.28) and using Lemma 2.2.16 yields 

^tr^. [(I ® p'')K)){{v^\\ = J2tijr\Vap)){{v^\ = Y.ycpv: = n{p). 

a a. OL 

In fact, the map T,, defined in Eq. (2.28) can be extended to a completely positive map 
on all operators A G i3(^). 

The operator i?^ is the unique operator for which Eq. (2.28) holds. To see this, assume 
that, to the contrary, Eq. (2.28) holds with some other operator R in place of Rt- Then, 
for any p G iS(J^), 



tr^ 



I ® p^){Rt-R) =OeB{jr). 



The fact that Rt = R is now a consequence of the following lemma [26]. 

Lemma 2.2.19 Let X be an operator on Jif ® . Suppose that, for any if) G , 

the operator j(r{ip\Xip)_j(r G B{Jf) is the zero operator. Then X is the zero operator on 
^ ® X. 

Thus we have shown that, for any completely positive T : i3(J^) B{J^), there exists 
a unique positive operator Rt G ® X such that Eq. (2.28) holds. However, this 
correspondence works in the reverse direction as well. That is, given a positive operator 
G i3(^ ® J^), the map 

(A) := tr^ [( I ® A^)r\ G B{J^) (2.29) 

is completely positive. In order to show this, we need the following trivial lemma. 

Lemma 2.2.20 Let be a Hilbert space. An operator X G B{J^) is positive if and only 
if it can be written in the form X = J2a l^o)(^a| foi" some collection {tpa} of vectors in 

jr. 

Now let a positive operator R & B{ ® be given. Then Lemma 2.2.20 states that 
we can write R = J2a where Va are some operators from X to J^. Substituting 

this form of R into Eq. (2.29), we get Tf-(A) := J2a Va^V*, which is completely positive. 
The map is then the dual of the map : B{J^) — > B{J^), which is also completely 
positive. 

We are interested in the specific case when T : B{Jif) B{Jt^) is a channel. Then 
T^, : B{J(^) B{J^) is a trace-preserving map. That is, for any A G i3(^), we must have 

ti T^{A) = tr {A~^ti,^RT) = tr A = trA^, 

which implies that ti j^^Rt = Ijr- We can summarize everything we have said up to now 
in the following theorem [26]. 
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Theorem 2.2.21 (duality between channels and bipartite states) Let M' and 

be finite- dimensional Hilbert spaces. There exists a one-to-one correspondence between 
channels T : B{J^) B{J^) and density operators p G S{J^ ® J^) with tr^p = 
(1/ (]jmJt)\z , given by 

T,{A) = dim^trjr[( I ® A^)p] VA e B{J^), (2.30) 

where T* : B{J^) B{Jif) is the dual channel corresponding to T. 

This correspondence can be extended to tensor products of channels in the following 
way [23, 149]. Let S : B{J^a) ^ B(=v^) and T : bIj^b) B{Xb) be channels, and let 
{cj} and {/^} be orthonormal bases of and respectively. Define the vectors 

'■= XI I ^» ® ^i) 

i 

and the operator 

Rs<s>t:={S ® id ® T ® id)(|I))^^((I| ® \1))bb{W)- (2-31) 
Then the action of S** ® on density operators p G iS( ® ^b) is given by 

^* ® T,(pA ® Pb) = tr^Atr,Xs ® pi4S5,jrs)^5®r • 

In the case of a product density operator pA ® Pb ^ S{J^a ® J^b) we recover the correct 
relation 5'=^ ® T^(/Oyi ® Ps) = 5'*(p^) ® T*(pb). 



2.3 Distinguishability measures for states 

In quantum information theory, we frequently encounter the following problem: given two 
states Ui and uj2, to what extent does one of them approximate the other? This problem 
is relevant, e.g., for the circuit model of quantum computation [8], whenever we need to 
determine how much the output states of an "ideal" quantum computer differ from the 
corresponding output states of the quantum circuit that approximates it. In this section 
we concentrate on two such measures of closeness for states, the trace-norm distance and 
the Jozsa-Uhlmann fidelity. We will freely use the concepts from the theory of trace ideals; 
the necessary background information is given in Sec. A. 3. 

2.3.1 Trace- norm distance 

Any state u over an algebra of observables A is given essentially as a "catalogue" of the 
expectation values w{A) for all A & A. Therefore it makes intuitive sense to say that two 
states Ui and 002 are close if the corresponding expectations uJi{A) and uj2{A) are close for 
all A ^ A. Without loss of generality, we can compare the expectations wi{A) and uj2{A) 
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on the unit ball of A, i.e., the set of all A G ^ with \\A\\ < 1. The corresponding measure 
of closeness between ui and 002 will thus be given by the variational expression 

D{uJi,UJ2) = sup \uJi{A) - UJ2{A)\ . 
A&A;\\A\\<1 

In fact, we can vary only over the group U{A) of the unitary elements of A (i.e., those 
UeAfoT which UU* = U*U = 1). This follows from the Russo-Dye theorem [30, p. 25] 
which states that the unit ball in a C*-algebra A with identity is the closed convex hull 
of the unitary elements of A. So we take 

D{uji,uj2)= sup \ui{U) - uj2{U)\ (2.32) 
ueU{A) 

as the putative measure of distance between loi and 002- 

Consider a concrete quantum system with the Hilbert space Jif, and let pi and p2 be 
a pair of density operators on J^. Then Eq. (2.32) will take the form 

D{pi,p2)= sup \tT (piU) - ti {p2U)\ . (2.33) 

Now we can use the fact that, for any A G B{Jif), the maximum of |tr (A?7)| over all uni- 
taries U is attained when AU > and equals the trace norm \\A\\-^ := tr (A*AY^'^ = tr \A\ 
[119, p. 43]. In other words, D{pi, P2) is precisely the trace-norm distance ||pi — P2|li- 

Remark: Please note that the trace-norm distance D{pi,p2) defined here is twice the 
trace distance D(pi,p2) defined by Nielsen and Chuang [91]. Therefore, in order to avoid 
confusion, we will no longer use the notation D{-,-). □ 

The trace-norm distance is obviously a metric on the set S{Jif) of all density operators 
on Jif, and therefore possesses all the properties that make "geometrical sense" (e.g., 
the triangle inequality). In particular, because ||p||^ = 1 for any p G iS(^), we have 
< IIpi — P2II1 < 2. It follows from the standard properties of norms that the minimum 
value is attained if and only if pi = p2, and it can be shown [1 15] that the maximum value 
2 results if and only if pip2 = (i.e., if and only if pi and p2 have orthogonal ranges). When 

pi and p2 are pure states, one can readily derive the formula ||pi — P2II1 = 2^1 — \{il^i\ip2)f , 
where ipi and ijj2 are the corresponding state vectors. Furthermore, we have the following 
key result. 

Theorem 2.3.1 Let T : B{Jif) — > B{J(f) he a channel. Then, for any pi,p2 G S{J(f), we 
have the following. 

1. ||T,(pi) -T,(p2)||i < IIpi -P2II1. 

2. IfT is unitarily implemented, i.e., T{A) = UAU* for a unitary U : , then 
Wmpi) -T*(p2)||i = IIpi -P2||i. 
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Proof: The proof of the first statement, due to Ruskai [113], runs as follows. Write 
Pi — P2 as a difference of two positive operators N^,N_ with orthogonal ranges, so that 
-P2I = N+ + N^. Then 

WUpi) - Up2)h = l|r,(iv+)-r.(iv_)i|, 

< ||T,(iV+)||, + ||T,(iV„)||, 
= trT,(Ar+)+trT,(iV_) 
= tr(A^+ + iV_) 
= IIP1-P2II1, 

which concludes the proof. 

To prove the second statement, note that V 1— UVU* is a group isomorphism between 
U{J^) and Therefore substituting U*piU and U*p2U into Eq. (2.33) instead of 

Pi and p2 does not change the value of the supremum. ■ 

The trace-norm distance also has an operational characterization in terms of general- 
ized quantum measurements, and we will come back to it in Sec. 2.3.3. 



2.3.2 Jozsa-Uhlmann fidelity 

Another useful distinguishability measure for quantum states, the fidelity, is given by the 
formidable-looking expression 

F(pi,p2) := (trvV^riW^)', (2.34) 

where pi and p2 are a pair of density operators. The fidelity (2.34) was introduced by Jozsa 
[63], but the original idea came from the work of Uhlmann [13S] who generalized the notion 
of the "transition probability" (iplcp) for pure states to general states over C*-algebras. For 
this reason we will refer to the fidelity F as the Jozsa- Uhlmann fidelity. 

The main appeal of the Jozsa-Uhlmann fidelity lies in the result known as the Uhlmann 
theorem [138]. We state this theorem in the form given by Jozsa [Go]. 

Theorem 2.3.2 (Uhlmann) Let pi and p2 be density operators on a Hilbert space Jif. 
Then 

F(pi,p2) = max|(^i|^2)|% (2.35) 

where the maximum is taken over all purifications ipi and ip2 of pi and p2 respectively in 
an extended Hilbert space .W ® .J^ . 

Proof: Without loss of generality, we may take ^ ~ because only the nonzero 
eigenvalues of a density operator are relevant for constructing its purification. Let {cj} be 
the eigenvectors of pi, and {/j} the eigenvectors of p2. We can write all purifications of p\ 
and p2 in the form l-y/piV^)) and \^J'p2UWJi with respect to the basis {cj}, where /j = f/cj, 
and V and W are the unitaries corresponding to the choice of basis in the auxiliary Hilbert 
space for each of the purifications. Writing 

i^fp'2UW\4p{V}, = tT{W*U*y^^V) = tT{VW*U*y^^) 
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and observing that U is determined by pi and p2, we see that the maximum 



2 



in Eq. (2.35) can be written as ma.xv(zu{.j^) tr {^/p2^/plV) and hence equals 



tr 



P2^JPl 



2 / I \2 



tr J y/PiP2^/pi) ■ This proves the theorem. 



Apart from its immediate physical significance, the Uhlmann theorem allows us to 
derive the properties of the fidelity (2.34). We summarize these properties in the theorem 
below, for the proof of which the reader is referred to the paper of Jozsa [63] . 

Theorem 2.3.3 (properties of the Jozsa-Uhlmann fidelity) 

1. < F{pi,p2) < 1 and F{pi, P2) = 1 if and only if pi = p2. 

2. F is a symmetric function: F{pi,p2) = F{p2,pi). 

3. If pi is pure, then F{pi,p2) = tr (pip2) for any p2. Otherwise, F(pi,p2) > tr(pip2). 

4- For a fixed p, F{p,-) is a concave function: F(p, Aipi + A2P2) > AiF(p, pi) + 
A2-F(p, P2) for any positive real numbers Ai, A2 with Ai + A2 = 1. 

5. F is multiplicative with respect to tensor products: F{pi ® P2, Ps ® P4) = 

^(Pl,P3)^(P2,P4)- 

6. If T is a channel, then F(T*(pi), T=i,(p2)) > F{pi,p2), where equality holds for all 
pi,P2 when T is unitarily implemented. 

It can be shown that the trace-norm distance and the Jozsa-Uhlmann fidelity are 
equivalent distinguishability measures for quantum states. This follows from the following 
key theorem [12], given here without proof. 

Theorem 2.3.4 (Fuchs-van de Graaf) For any two density operators pi,p2, 



2 - 2^F(pi, P2) < llpi - P2II1 < 2^1 - F{p,, P2). (2.36) 

Since the Jozsa-Uhlmann fidelity is easier to compute than the trace-norm distance, the 
Fuchs-van de Graaf theorem provides a quick and painless way to get tight estimates of 
the trace-norm distance. 



2.3.3 Quantum detection theory 

As we have shown in Example 2.2.8, any measurement performed on a quantum system 
with the Hilbert space can be described by a collection of effects F^, that form a 
resolution of identity on i.e., "^Zx^x = I- This is an example of a positive operator- 
valued measure, which is defined as follows [59]. 

Definition 2.3.5 Let (J^T, S) be a measurable space, i.e., ^ is a set and is a a-algebra 
of subsets of ^ , and let ^ be a Hilbert space. Then a positive operator-valued measure 
(POVM) on 2^ with values in B{Jif) is a map F from ^ to the positive operators on 
which is 
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(a) normalized: F(0) = 0,F(jr) = I. 

(b) a-additive: for any countable collection of pairwise disjoint sets G S, F(Ui^i) = 
YliiF{,'^i), where the sum converges in the strong operator topology. 

The definition just given is tlie most general. In tliis section we will content ourselves 
with the case when the set ^ is finite, so we will not have to deal with cr-algebras and 
the like. Then any POVM is simply a collection of positive operators {-Fx|a^ G ^} with 
J2x Fx = I- These operators will be referred to as the elements of the POVM. 

Consider the following problem. We are presented with a quantum system whose state 
is unknown, but we are told that it is drawn from some known set {pm}m=i according to 
the probability distribution {pm}m=i- Our task is to devise a measurement that would 
maximize the probability of correctly identifying the state. This is known as the M-ary 
quantum detection problem [GS]. 

Any measurement we would perform will be described by a POVM Fm on the M- 
element set {1, . . . , M}. Given the state p, the probability of identifying p as pm is equal 
to tr (pFm). Thus the average probability of correct decision using the POVM F := {Fm} 
is given by 

M 

P,[F] := J2pmtT{pmFm). (2.37) 

m=l 

The problem of designing the optimum M-ary quantum detector thus amounts to finding 
the M-element POVM F that would maximize -Pc[-P]- 

It is not possible to give a general closed-form expression for the POVM that would 
maximize Eq. (2.37). However, a theorem of Yuen, Kennedy, and Lax [147] gives necessary 
and sufficient conditions for a given POVM F to be a maximizer of -Pc[-^]- Usually the 
candidate POVM's are found by inspection or by taking advantage of the problem's in- 
trinsic symmetries, should they exist, in which case the Yuen-Kennedy-Lax theorem gives 
a quick way to verify the optimality. For many interesting examples, the reader is invited 
to consult their article [147], as well as the book by Helstrom [58]. 

We give a complete solution of the binary quantum detection problem (M = 2). In this 
case we are considering two-element POVM's {F, 1 — F}, so there is only one independent 
operator F that must satisfy the condition < F < I. The corresponding variational 
expression is 

= max^{pitr (piF) + pstr [p(I - F)]}, 

which simplifies to 

Pc=P2 + gm^x^tr [{pipi - P2p2)F]. (2.38) 
We have the following theorem [58]. 

Theorem 2.3.6 (optimum binary quantum detection) Consider the binary quantum 
detection problem for the density operators pi and p2 and the probabilities pi andp2- Then 
the optimum average probability of correct decision is given by 

Pc = l + l\\piPi-P2P2\\,, (2.39) 
and the elements of the optimum POVM can be chosen to be projection operators. 
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Proof: Write down the orthogonal decomposition pipi — P2P2 = R+ ~ R-i where R± > 
and R+R^ = 0. Because -R„ > 0, we have tri?„F > for any F > 0, so 

max tr \(R+ - R-)F] < max tr (R+F) < tri?+, 

0<F<I L\ -r /J — o<ir<l \ t / — 

where the maximum is achieved by the projection operator P with PR^ = i?+ and P-R_ = 
0. Thus 

max tr \(R+ - R-)F] = tiR,. 

0<F<I ^ ' ' ^ 

Because tr (piPi — P2P2) = Pi — P2, we have tr i?_ = tr _R+ + p2 — Pi- Also 

IbiPi -P2P2II1 = tr \pipi -P2P2I = tri?+ + tri?_ = 2tri?+ + p2 - pi, 
whence it follows that 

gmax^Pcii^] =P2+ max^tr [{pipi - p2p2)F] = l + \ \\PiPi - P2P2II1 • 

The optimizing POVM is then given by {P, I — P}. M 

Theorem 2.3.6 clearly exhibits the prominent role played by the trace-norm distance 
in the quantitative characterization of the performance of generalized quantum measure- 
ments. In particular, the probability of correct discrimination between two equiprobable 
states pi and p2 equals 1/2 + (1/4) — p2||i. Furthermore, the maximum average proba- 
bility (2.39) of correct decision equals unity if and only if pi and p2 are such that the trace 
norm \\pipi — P2P2II1 attains its maximum value of +p2 = 1) which happens if and only 
if P1P2 = 0. In the case when pi and p2 are pure states, this reduces to the requirement 
that the corresponding state vectors be orthogonal. 

The trace-norm distance between states pi and p2 can also be expressed as [91, p. 405] 

IIPi -P2II1 = 2 sup ^|tr(piF„) -tr(p2i^m)| , (2.40) 

{Fm} m 

where the supremum is taken with respect to all POVM's whose elements belong to B{J^), 
where ^ is the Hilbert space on which the density operators pi and p2 act. A similar 
expression can be derived for ||pipi — P2P2|li; which shows that it is sufficient to consider 
only two-element POVM's for the solution of the binary quantum detection problem. The 
sum on the right-hand side of Eq. (2.40) is the so-called Kolmogorov distance between the 
probability distributions {tr (piFm)}^=i and {tr (p2-^m)}m=i- 

Incidentally, the Jozsa-Uhlmann fidelity (2.34) can likewise be given an intuitive oper- 
ational meaning in terms of generalized quantum measurements by means of the formula 
[91, p. 412] 

^F{P1.P2) = inf^ E ^J^^{PlFrn)tI{p2F^). (2.41) 

The sum in the right-hand side of (2.41) is the so-called Fisher metric [6, p. 29], a 
Riemannian metric on the manifold of probability distributions on an m-element sample 
space. The physical meaning of Eqs. (2.40) and (2.41) is apparent: whenever we engage in 
the business of distinguishing quantum states, we are essentially distinguishing probability 
distributions describing the outcomes of generalized quantum measurements. 
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2.4 Distinguishability measures for channels 

In the preceding section we have discussed ways in which we can compare quantum states. 
It is also important to have at our disposal some tools for the comparison of channels. In 
this section we describe two distinguishability measures for channels, the cb-norm distance 
and the channel fidelity. The latter distinguishability measure was defined and studied by 
the present author [103]. 



2.4.1 Norm of complete boundedness 

Just as we have defined a distinguishability measure for states as a metric induced by 
the trace norm, it should likewise be possible to construct a distinguishability measure 
for channels from B{J^) to B{J(f) in a natural way from a suitable metric on the set of 
all completely positive maps from B{Jif) to B{J^). One possible candidate is the metric 
induced by the operator norm, 

||T||:= sup ||T(X)||. (2.42) 

XeB{^^);||X||=l 

Unfortunately, the operator norm is rather ill-behaved: it is not stable with respect to 
tensor products. In particular, there are some positive maps T, for which the norm 
II T (8) id„|| will increase with n, as the following example [97] shows. 

Example 2.4.1 (transposition map revisited) Consider the transposition map O (cf. 
Example 2.2.1) on the algebra Ai2, and define the map 62 := 6 ® id2 on ® -^2- 
Let F be the flip operator 
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for which we have ||-F|| = 1. Now 

02(F) = ( 



e(en) 6(621) 
e(ei2) 6(622) 
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which has norm 2. Thus IIO2II > 2. □ 

A good choice then is the metric induced by the stabilized version of the operator norm 
(2.42), namely the norm of complete boundedness (or cb-norm for short), defined by [97] 

||T||^b :=sup||T ® id„||. (2.43) 

n 

For any operator X G B{Jif) and any two maps S, T on B{Jif) with finite cb-norm (in 
the case of finite-dimensional Jif, this is always true [97]), we have the relations 

\\TiX)\\ < ||r||^J|X||, (2.44) 
ll^^llcb < ll^llcbrilcb, (2-45) 
ll^®^llcb = ll^llcbrilcb- (2.46) 
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Furthermore, for any completely positive map T, we have [97] = ||T(I)||. This 

implies, in particular, that = 1 for any channel. 

We have defined the cb-norm (2.43) for channels that act on observables, but we also 
need a similar norm for the corresponding dual channels that act on states (and, by linear 
extension, on trace-class operators). Thus let T : B{J^) B{J^) be a channel, and let 

: Tl{J(^) Ti{Jif) be its dual. Then we define the norm of T* as 

||T,||:= sup ||T,(X)||,, 

XGri(^);llX|li = l 

and the corresponding cb-norm as 

||T,||^b := sup||T, ® id^ll . (2.47) 

n 

Luckily, the cb-norms of T and agree, as follows from the following argument. We have 



sup 



sup \tT[T,{X)Y]\= sup ||T,(X)||, = ||r. 



ll-^lli 



and 



sup sup |tr[T,(X)F]| 

l|Jf|ll = l Ili-INl 



which shows that ||T|| = ||T*|| for any two maps T : B{Jif) B{J^) and T^, : 7i(^) 
Ti{,^) that are connected via the duality relation 

tr [XT{Y)] = tr [%{X)Y], MX e Ti(^), VF G B{J^) 

and such that at least one of them has finite norm. Therefore we have, for any n, 
II T ® id„|| = II CS) id„||; taking the supremum of both sides with respect to n, we 
g^t ll^llcb ~ ll^*llcb- '^^is equality holds for completely positive maps in particular, and 
for completely bounded maps in general (e.g., for sums and differences of completely pos- 
itive maps). Thus the properties similar to (2.44)-(2.46) also hold for the cb-norm (2.47), 
but with obvious modifications (e.g., with the operator norm replaced by the trace norm). 
In particular, we have ||T*||^|^ = 1 for any channel T. The "dual" cb-norm (2.47) has 
appeared, under different guises, in the work of Aharonov, Kitaev, and Nisan [2], Giedke 
et al. [19], and Kitaev [66]. 

If two channels T, S are close in cb-norm, then, for any density operator p, the cor- 
responding states T^:{p), S^{p) are close in trace norm since, from Eq. (2.44), it follows 
that 

\\TM - sMWi = m - s.)ip)\\, <\\n- 5*ILb = r - s\u ■ 

In fact, the above estimate cannot be loosened by adjoining a second system with the 
Hilbert space in some state p,jf , entangling the two systems through some channel K 



= sup sup |tr[XT(r)]| 

||y||=i ||x|ii=i 

= sup ||T(F)|| = ||T||, 

II^INi 
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Figure 2.1: Using entanglement to distinguish between quantum channels. 

on B{ Jif ® J^), and then comparing the channels T <S) R and S <S) R, where R : B{J^) 
B{,y(f) is some suitably chosen channel. This is evident from the estimate 



which can be easily obtained by repeated application of Eqs. (2.44)-(2.46). In other words, 
as far as the cb-norm distinguishability criterion is concerned, entangling the system with 
an auxiliary system will not improve distinguishability of the channels T and S. The cb- 
norm, however, is an extremely strong distinguishability measure: its definition already 
accounts for optimization with respect to entanglement and input states over Hilbert 
spaces of very large (but finite) dimension. There exist weaker measures of channel distin- 
guishability (such as the channel fidelity presented below) that describe how channels may 
be distinguished with bounded resources. Using these weaker criteria, one may show that 
the use of entanglement does entail an improvement in the practical distinguishability of 
both states and channels [27]. 



Recall that the Jozsa-Uhlmann fidelity (2.34) can be given an intuitive operational meaning 
in terms of the Fisher metric on the manifold of probability distributions. This suggests 
that any experiment designed to distinguish between two given quantum states amounts 
to distinguishing a pair of suitable probability distributions. 

It is tempting to apply the same idea to distinguishability of channels. Consider the 
situation portrayed in Fig. 2.1. Namely, suppose that we are given a "black box" that 
effects one of two channels S*, T. In order to tell what the "black box" does, we can exploit 
the correspondence (2.27) between channels and bipartite states. That is, we prepare two 
systems, A and B, in the maximally entangled state and then let the "black box" 

act on A, while leaving B untouched. The resulting state is, up to normalization, the 
i?-operator (2.27) of the unknown channel. In order to distinguish between S and T, we 
simply perform a measurement that would distinguish between the states ( 5"* ® id)(|I))((I|) 
and (T* ® id)(|I))((I|). In fact, D'Ariano, Lo Presti, and Paris have recently shown [27] 
that strategies of this kind generally result in improved distinguishability. 



|(T, ® R,)K,{p ® p._r) - (5, ® R,)K,{p ® pjr)|li < \\T, - S, 



* llcb ' 



2.4.2 Channel fidelity 
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The correspondence (2.27) between channels T : B{Jf) B{J^) and positive opera- 
tors Rt G B{ M' ® J^) is, as we already stated, bijective: namely, Rs = Rt if and only if 
S = T. Furthermore, it is easy to see that pt '■= {l/d)RT, where d = dim J^, is a density 
operator. Hence it seems natural to define the fidelity !F{S, T) between two channels S 
and T as the Jozsa-Uhlmann fidelity between the density operators ps and pt'- 

J^{S,T):=F{ps,Pt). (2.48) 

Being expressed in terms of the Jozsa-Uhlmann fidelity F, the channel fidelity JF inherits 
many of its natural properties. We now summarize these properties with brief proofs [103]. 

Theorem 2.4.2 (properties of the channel fidelity) Let S,T be channels B{J^) 
B{,j(f ), and let d = dimJif. Then the channel fidelity T has the following properties. 

1. < ^^(5, T) < 1, and ^{3, T) = 1 if and only if S = T. 

2. J-'{S,T) = J^(T, S) (symmetry). 

3. For any two unitarily implemented channels U and V [i.e., U*{p) = UpU* and 
V;(p) = VpV* with unitary U and V], T{IJ , V) = (l/d^) |tr {U*V)\\ 

4. For any real A with < A < 1, J^{S, ATi + (1 - A)T2) > XJ^{S, Ti) + (1 - X)J^{S, T2) 
(concavity). 

5. J-'{Si ® 5*2, Ti ® T2) = JF(S'i, Ti)jF(S'2, T2) (multiplicativity with respect to tensor- 
mg). 

6. T is invariant under composition with unitarily implemented channels, i.e., for 
any unitarily implemented channel U , J^{SU, TU) = J^{S, T) and T{U S, UT) = 
T{S,T). 

7. T does not decrease under composition with arbitrary channels, i.e., for any channel 
R, T{SR,TR) > J^{S,T). 

Proof: 

1, 2. These hold because ps and pT are density operators, and because T pt is a 
bijection. 

3. Rjj = |f/))((t/|, and similarly for Ry. Thus both p(j and py are pure states. Since for 
pure states V', we have F = \{ip\^)\\ it follows that J^{U,V) = (l/rf^) \{{U\V))f = 

(l/d^) \tT{U*V)f. 

4. Note that the map T pT is linear. Thus, for T = ATi + (1 — A)T2, we have 
Pt = XpTi + (1 — A)pt2) and the concavity of JF follows from the concavity of the 
mixed-state fidelity (2.34). 

5. It follows from Eq. (2.31) that ps»T = P5 ® Pt, and the multiplicativity property 
of JF follows from the corresponding property of F. 
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6. Write Tt/ ® id = (T ® id){U ® id) to obtain p^^ = ( f/ ® I)pr(t^* ® I), and 
do the same for SU. Since the mixed-state fidehty F is invariant under unitary 
transformations, the same property holds for the channel fidelity JF. For UT, we 
have = (T* ® id)\U)){{U\ = (I ® ?7^)pt( I ® (^^)*), and the same holds 
for ^75". Because U is unitary, is unitary also. Thus pr and p^jj. are unitarily 
equivalent, as are ps and pj)^, and the desired conclusion again follows from the 
unitary invariance of the Jozsa-Uhlmann fidelity. 

7. The same reasoning as before, except now we have to use the property that F{{R^ ® 
id)p5, (i?* (g) id)pr) > F{ps,Pt)- 



Remark: Property 6 (invariance of J-' under unitary transformations) implies that 
our definition of the channel fidelity is good in the sense that we could have used any 
maximally entangled pure state to define the density operators ps and pt for the channels 
S and T to obtain the same numerical value for the fidelity J-'{S,T) := F{ps,Pt)- D 

Our next step is to obtain a meaningful analogue of Uhlmann's theorem for the channel 
fidelity JF. In order to do that, we must draw the connection between the channel T and 
purifications of the density operator p^. As a warm-up, let us first prove the following 
lemma [103]. 

Lemma 2.4.3 Given a channel T : B{Jif) B{J^), where and are isomorphic 
Hilbert spaces with d = dim J^f = dim J(f , the density operator px is pure if and only if 
the channel T is unitarily implemented. 

Proof: Proving the forward implication is easy: p^ = (l/c?) | [/))(( [/|, which is a pure 
state. Let us now prove the reverse implication. Suppose that, given the channel T, 
the state px is pure. It follows from Theorem 2.2.21 that the reduced density operator 
ir^pT is a multiple of the identity, i.e., a maximally mixed state. Since pt is pure, the 
reduced density operators tT^Pr and ti,^pT have the same nonzero eigenvalues [83]. All 
eigenvalues of tr^pj- = {l/d)l are equal and positive. Since ~ by assumption, 
trjf Pt and ti^pT are isospectral. Hence pr is a maximally entangled state and therefore 
has the form (l/(i)|f/))((f/| for some unitary U (cf. Example 2.2.17). Using Eq. (2.24), we 
can write 

|[/))((f/| = (f/® I)|I))((I|(t/*®I), 
which implies that T is a unitarily implemented channel. ■ 

Therefore, for any two unitarily implemented channels U and V, the states P(j,Py are 
already pure and, as stated in Theorem 2.4.2, 

HU,V) = J, \mV))\' = ^ \i^{U*V)\\ (2.49) 

which is nothing but the squared normalized Hilbert-Schmidt inner product of the op- 
erators U and V . As we shall now show, the fidelity ^^(5*, T) for arbitrary channels 
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S, T : B{Jif) B{J(f) can be expressed as a maximum of expressions similar to the 
right-hand side of Eq. (2.49), but with the difference that, in place of the unitaries U and 
V , there will appear certain isometries from to ® (S", where ^ is a suitably defined 
auxiliary Hilbert space. The following theorem [103] states this in precise terms. 



Theorem 2.4.4 Let S,T : B{J^) B{,J^) he channels, where ^ and are finite- 
dimensional Hilbert spaces. Then we can choose a Hilbert space S' and two isometries 
V,W : ^ M' ® S such that, for any A E B{J^), 

S{A) = V*{A® 1^)V (2.50) 
T{A) = W*{A® 1^)W, (2.51) 

and the isometries V,W are unique up to a unitary transformation of S' . Furthermore, 
where the maximum is taken over all such isometries V and W . 

Proof: Consider the channel S. Given any Kraus decomposition {Va] of S*, we can define 
the isometry V through Eq. (2.16). It can be shown that we can always choose a Kraus 
decomposition of S in such a way that the operators forming it are linearly independent 
in the sense of Hilbert-Schmidt; such a decomposition, referred to as the minimal Kraus 
decomposition [90] , will consist of at most dim ^ ■ dim J(f operators. The same holds for 
the channel T. Then we can choose the Hilbert space ~ ^ ® J(f and add as many 
zero operators to the given minimal Kraus decompositions {V^} and {W^q} of S and T as 
necessary. 

Assuming that this has been done, we can write 
and construct the purification of ps in ® ® ^ , 
Let us define the isometry V : ® ® ^ ® (S' through 

V'{tp ® 0) := ^ (Vaij) ® ® Ca, 
a 

in which case we have ips = (1/Vdim jr)y^|I) ). Do the same thing for the channel T 
to arrive at the purification ipT = (1/Vdim J(^) W'\T)) of pr- From Uhlmann's theorem 
(Theorem 2.3.2) we have 

F{ps,Pt) = max \{tps\^T)\^ ■ 
ips,n'T 

It is easily shown that {^JsH^t) = {l/dimJ^){{l\{V'*W')T)) = (1/dim jr)trl^*iy. Hence 
the maximization over the purifications tps and tpr of ps and pT is equivalent to the 



2.4. Distinguishability measures for channels 



37 



maximization over the isometries V and W, and the theorem is proved. ■ 

Finally we consider the relation of the channel fidelity JF to the cb-norm. Using the 
properties of the latter, as well as the Fuchs-van de Graaf theorem (Theorem 2.3.4), we 
easily obtain the inequality 

2-2^J^{S,T) < \\S-T\l^. (2.53) 

It is certainly an interesting and important problem to derive an upper bound on US' — 
in terms of J^{S,T). We can expect that this upper bound will not be nearly as tight as 
the lower bound because, as we have indicated above, the cb-norm distance is a much 
stronger distinguishability criterion than the channel fidelity. 

However, in the case when one of the channels is the identity channel, and the other one 
is an arbitrary channel T : B{J^) —>■ B{^), we actually can bound the channel fidelity 
in terms of the cb-norm both above and below. For this purpose we need the off-diagonal 
fidelity of the channel T : B{J^) B{jr), defined by [144] 

^%(r) := sup Re(0|r,(|0)(^|)^), 
for which we have the inequality 

r-idiLb<Vi--^%(^)- 

Then JF(T, id) < J^%{T ® id), so that, using the fact that the cb-norm is multiphcative 
with respect to tensor products, we get 

||T-id|U<4^1-^(T,id). (2.54) 
Combining inequalities (2.53) and (2.54) yields 

(l - ^ r - idllcb)' < HT, id) < 1 - ^ ||T - id||,\ . (2.55) 

The upper bound in this inequality is not nearly as tight as the lower bound. Indeed, 
when ||T — id||^|_, equals its maximum value of 2, the fidelity J^{T, id) can take any value 
between and 3/4. This serves as yet another indication that the cb-norm is a much more 
stringent distinguishability criterion than the channel fidelity J-'. 
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Among the basic postulates laid down by the founding fathers of statistical physics, there is 
the so-called zeroth law of thermodynamics [o(), p. 3], [12 I , p. 18] which expresses formally 
the empirical fact that any large system will normally be observed in an equilibrium state 
characterized by a few macroscopic parameters, and that any system not in equilibrium 
will rapidly approach it. This process of return to equilibrium is normally referred to as 
relaxation [72]. One of the main parameters of a relaxation process is the relaxation time 
Troiax — if "we disturb a large system at t = and then observe it again at t ^ Trciaxj we will 
find, with high probability, that the system is in a state arbitrarily close to equilibrium. 

The classic example of a relaxation process is the phenomenon of thermalization, i.e., 
when a physical system reaches thermal equilibrium with its surroundings, the latter being 
maintained at an absolute temperature T. It is a basic result in statistical mechanics [132, 
p. 153] that the corresponding equilibrium state is precisely the canonical (or Gihhs) state 

where H is the Hamiltonian of the system, /3 := 1/kBT is the inverse temperature, and 
the normalizing factor Zp := tie~^^ is known as the canonical partition function. We will 
discuss the Gibbs state in greater detail in Ch. 4, when we talk about the Gibbs variational 
principle. 

The ultimate goal of the experimental research into quantum information processing 
is the construction of a reliable large-scale quantum computer. Such a computer will 
necessarily be a macroscopic system subject to the laws of thermodynamics; therefore it 
makes sense to deal with such things as approach to equilibrium, and relaxation processes 
in general, in the context of quantum information theory. 
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3.1 Relaxation processes and channels 

How can we model a relaxation process using the tools of quantum information theory? 
A good way to do this is by means of a discrete-time version of a quantum dynamical 
semigroup [31]. Consider a quantum system with the algebra of observables A, and let 
T : A ^ A he a. channel^. Then the dynamics is given by the semigroup {T'"}„gp^, so that 
T represents a single step of the dynamics. Given any initial state G S{A), we can track 
the evolution of the system under the semigroup dynamics by following the orbit {u;„}, 
where Un '■= ujq o T". The dynamics defined in this way is stationary (i.e., the evolution 
law is independent of time) and Markovian (i.e., the state Un depends only on the initial 
state uq). 

A good model of relaxation should satisfy the following natural requirements. (1) 
There should exist a unique state G S{A) such that ujj- oT = uj^- (2) For any choice 
of the initial state ujq, the sequence {uJn} should converge weakly* to ojt, in the sense of 
the following definition. 

Definition 3.1.1 Let ^ he a Banach space with the dual Banach space ^* . Then the 
sequence in S^* is said to converge to some uj G ^* in the weak* sense, written as 



w*. 



•lim„^ooi-^n = ^, if for any X G ^ we have lim„^oo i-^n(-^) = uj{X). 



It can be shown [16, p. 68] that weak* convergence of a sequence {uJn} of normal states 
implies trace-norm convergence of the sequence {pn} of the corresponding density opera- 
tors, and vice versa. (3) The convergence of the orbit {a;„} to the equilibrium state ut 
should be exponential, i.e., there should exist a constant k with < k < 1 such that, 
for any A E A, \uJn{A) — ci;j'(y4)| < C^i/c", where Ca is a constant depending on A. Our 
reason for insisting on this is dictated essentially by the zeroth law. 

The first requirement states that the channel T must be ergodic, according to the 
following definition. 

Definition 3.1.2 Let A be an algebra of observables. A channel T : A ~^ A is called 
ergodic if there exists a unique state Ut G S{A) such that ljt °T = Ut. 

Ergodicity of the channel T implies that, for any state u G S{A), the ergodic mean 

1 ^ 



UJn 



7E^°r" (3.2) 



converges weakly* to the unique T-invariant state u^. In terms of the corresponding 
density operators, we would then have 



lim 



1 ^ 



T7-T E T:{p) - PT 
"I" n=0 



0, 



^We have made the unfortunate choice of denoting a typical channel by the letter T, the same letter 
also being used for temperature. We hope that it will always be clear from the context what the letter T 
stands for. 
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where is the dual channel corresponding to T, and px is the density operator corre- 
sponding to the unique T-invariant state, i.e., T^,{pt) = Pt- 

To prove the weak* convergence of the ergodic mean (3.2) to ut, we first note that, 
because the state space of a C*-algebra with identity is weakly* compact [16, p. 53], the 
sequence {Con} has a weakly* convergent subsequence. Furthermore, for any A E A, 



o T)iA) - u^iA)\ = ^ {u o r^+i)(A) - u{A) 



< 



2 



i.e., any weakly* convergent subsequence of {ujn} has a T-invariant limit. Because the 
T-invariant state ut is unique, we see that et'er^/ weakly* convergent subsequence of {Con} 
converges to ut- Thus it follows that 



N 

w*-lim 

N^oo 



yujoT'' = u;T Vcj G S(A) 



by weak* compactness of S (A) . 

However, mere ergodicity of T is not sufficient; we also demand weak* convergence of 
the orbit {u o T"} for any uj G S{A). Thus, for any observable A & A, the sequence of 
the expectation values {{u oT'^)(^A)} should converge to the expectation value ut{A), i.e., 
the dynamics should be mixing. In terms of the corresponding density operators, we must 
have 

lim ||T:(p)-pr||, = 



+ 00 



for all p. While mixing implies ergodicity, the converse is not necessarily true. It turns 
out, however, that, at least in the case when the underlying Hilbert space Jif is finite- 
dimensional, there is a condition under which ergodicity and mixing are equivalent. This is 
the content of the following result of Werner, stated in the article by Terhal and DiVincenzo 
[1.35]. 

Theorem 3.1.3 (Werner) Let Jif be a Hilbert space of finite dimension d. Let T : 
B{Jif) —>■ B{J^) be a channel with the dual channel T^,. Suppose that the map T^,, extended 
linearly to all ofB{Jif), has a unique fixed point pT G S{M') . Then there exist a polynomial 
P and a constant k G (0, 1) such that, for any p G S{Jif), 

||T:(p)-pT|li<C,P(n)r, (3.3) 

where the constant depends on the dimension d. Furthermore, if we view as a 
linear operator on B{J^) with the eigenvalues pm, then k = maxm;/i„^i |pm| ■ If is 
diagonalizable, then the estimate (3.3) holds with P = 1. 

The main requirement of Theorem 3.1.3 is the uniqueness of the fixed point of T* 
in the entire algebra B{J^), not just in the state space S{^). However, if the only 
information we have is that there is a unique T-invariant state, the above criterion may 
not apply. Fortunately, there exist other methods of proving the mixing property, such as 
the following theorem [132, p. 52]. 
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Theorem 3.1.4 (Liapunov's direct method) Let ^ he a separable compact space, and 
let T : ^ —>■ be a continuous map. Suppose that there exists a strict Liapunov function 
for T, i.e., a continuous functional f on ^ such that, for any x G ^ , (/ o t){x) > f{x) 
unless t{x) = x. Suppose also that r has a unique fixed point Xr ^ 2J . Then, for any 
X G ^ , the sequence {t"'{x)} converges to x^. 

Remark: In order for Theorem 3.1.4 to be applicable, the topology of ^ must be such 
that (a) ^ is separable and compact, (b) r is continuous, and (c) / is continuous. Then 
the sequence {t"'{x)} converges in this topology. □ 

Finally we come to the last desideratum on our list, namely the exponential convergence 
of the sequence {TJ'(p)}. When the algebra of observables is finite-dimensional. Theorem 
3.1.3 says that this holds whenever T^, has a unique fixed point (which would necessarily 
be a density operator [135]), and is a diagonalizable linear operator on B{.j^). However, if 
the only piece of information we have to go on is the uniqueness of the T-invariant state, 
then Theorem 3.1.3 will not apply. We can only say that, in general, the exact convergence 
rate of the orbit {T^{p)} will depend on the spectrum of T^,. 

Explicit models of relaxation processes were constructed using the tools of quantum 
information theory in the articles of Scarani et al. [116] and Ziman et al. [152]. Also, 
an interesting paper by Terhal and DiVincenzo [135] investigates the possibility of using 
quantum computers to simulate relaxation processes. In this chapter we describe another 
approach to this problem, via the so-called strictly contractive channels. Our exposition 
closely follows Ref. [101]. 

Before we go on, we make one important comment concerning notation. For the 
most part of our discussion in this chapter, as well as in the next chapter, we will deal 
with transformations of states (i.e., the Schrodinger picture). Therefore, if we are given a 
system with the Hilbert space J^, we will use the term "channel" to refer to any completely 
positive trace-preserving linear map T : S{M') — > S{^), and we will also omit the asterisk 
subscript in order to avoid cluttered equations. On those rare occasions when we do talk 
about the Heisenberg picture, the corresponding map will be denoted by T. 

3.2 Strictly contractive channels 
3.2.1 Definition 

Recall Theorem 2.3.1, which states that, for any channel T and any two density operators 
p, a in its domain, 

\\T{p)-T{a)\\,<\\p-a\\,. 

In other words, any channel is a contraction in the trace norm on the set of density 
operators. Now consider the following definition. 

Definition 3.2.1 A channel T : S{J^) — S{M') is called strictly contractive if there 
exists a constant k G [0, 1), called the contractivity modulus, such that 

\\T{p)-T{a)\\,<k\\p-a\\, (3.4) 
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for any pair of density operators p, cr G S{J^). 

As shown in Fig. 3.1, the action of a strictly contractive channel on the set S{J^) can 
be visualized as a uniform shrinking of the trace-norm distance between any two density 
operators p, a. 

It is easily seen that any strictly contractive channel satisfies our requirements for a 
relaxation dynamics. First of all, the set iS(J^) is a closed subset of the Banach space 
Tl(J^) of the trace-class operators on Jif. Then the contraction mapping principle (cf. 
Section A. 4) tells us that there exists a unique density operator G S{Jif) such that 
T{pt) = Pt- Furthermore, given any pair Po.ctq G S{Jif), consider the orbits {T"(po)} 
and {T"(cro)}- Strict contractivity shows that these orbits get exponentially close to each 
other with n because 

||T"(po)-r"(ao)||,<A;"||po-(To||i. 

Furthermore, each orbit converges to p-r as n — > oo. Thus the image of S{Jif) under the 
iterates T" shrinks to a point (namely, pt) exponentially fast. These features naturally 
lead us toward mixing, and hence ergodicity, because the trace-norm convergence of the 
orbit {T"(p)} implies the convergence of the expectation values tr [ylT"(p)] to tr (Apx) for 
any A G B{Jf). 

Another feature of strictly contractive channels is that they render the states of the 
system less distinguishable in the sense of quantum detection theory (see Section 2.3.3). 
To see this, we observe that no two density operators in the image of S{J^) under a strictly 
contractive channel can be farther than 2k from one another in terms of the trace-norm 
distance. This puts an upper bound on the optimum probability of correct discrimination 
between any two equiprobable density operators in the image of a strictly contractive 
channel in a binary quantum detection scheme, namely 

- 2 

Thus there is always a nonzero probability of making an error, which satisfies the bound 

1-k 



Pe> 



In any realistic setting, hardly any event occurs with probability exactly equal to unity. For 
instance, we can never prepare a pure state but rather a mixture (1 — e)|?/')(?/'| + ep. 
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where both e and p depend on the particulars of the preparation procedure. Similarly, the 
measuring device that would ideally identify perfectly will instead be realized by (1 — 

5)\'ip){'il)\ + (5-F, where 5 and the operator -F, < F < I, are again determined by practice. 
If we assume that, in any physically realizable quantum computer, all state preparation, 
manipulation, and registration procedures can be carried out with finite precision, then it 
is reasonable to expect that there exist strict bounds on all probabilities that figure in the 
description of the computer's operation. 

As we argued in Chapter 1, any imprecision in a nonideal quantum computer can 
be traced back to our inability to distinguish between quantum states beyond a certain 
resolution threshold. In other words, in any experimental situation there will always be 
some small eo such that any two states with ||p — ct||i < eo must be considered practically 
indistinguishable. It follows from the discussion above that strictly contractive channels 
capture this intuition mathematically. Furthermore, as we will see later, any channel 
can be approximated arbitrarily closely in cb-norm by a strictly contractive channel. This 
means that, for any channel T and any e > 0, there exists some strictly contractive channel 
T such that ||T - T'\\^^ < e. Then, using Eq. (2.53), we see that T{T,T') > (1 - e/2)2 ~ 
1 — e for e sufficiently small. This, of course, means that the channels T and T' cannot 
be distinguished by any experimental procedure whenever e is less than the threshold 
resolution eo. 

We note that strictly contractive channels have been mentioned in the text of Nielsen 
and Chuang [91], but none of their properties, apart from the uniqueness of the fixed 
point, were described. 

3.2.2 Examples 

In this section we give a few examples of strictly contractive channels. All of these channels 
have been extensively studied by the researchers in the field of quantum information theory. 



Example 3.2.2 (degenerate channel) Let p be a density operator on J^, and consider 
the map Kp : X (trX)p. This is a completely positive trace-preserving map of Ti{Jif) 
into itself, and its restriction to S{J^) is the channel that maps any density operator 
a G S{J^) to p. It is easy to see that p is the only fixed point of Kp, and that Kp 
is strictly contractive with k = 0. Channels of this form are called degenerate (in the 
terminology of Davies [31]). □ 

Example 3.2.3 (depolarizing channel) Let Jif be a Hilbert space of finite dimension 
d. For any p G (0, 1], define the map 

Dp := (1 - p)id + pKj/d. 

The restriction of Dp to S{J^) is the so-called depolarizing channel 

Dp{p) = (1 -p)p + P^- 

For any pair p, a G S{J^), we have 

\\Dp{p)-Dp{a)\\^ = {l-p) Hp -all,. 
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which shows that Dp is strictly contractive with k = 1—p. The unique T-invariant density 
operator is the maximally mixed state 1/d. Channels that preserve the maximally mixed 
state are called bistochastic. □ 

Example 3.2.4 (two-Pauli channel) Consider the channel on iS(C^) with the Kraus 
operators 



Vi = ^1, V2 = ^(1 -p)/2(Ti, Vs = -i^{l-p)/2<j2. 
This channel is bistochastic and strictly contractive with k = max {p,2p — 1}. □ 

Example 3.2.5 (amplitude damping) The channel on iS(C^) with the Kraus operators 
is strictly contractive with k = y/1 — 7. Its unique fixed point is the pure state \^|:+){^p^\ 

with (T3'l^+ = ip+- □ 

Example 3.2.6 (thermalization of a qubit) Consider the Hamiltonian H = Ea^ with 
-E > 0, and the corresponding Gibbs state 

_ 1 / e-^^ \ 

Let p := exp (— /3_E')/(2cosh/3£'). Consider the map T : iS(C^) S{€?) given in the 
Kraus form T(p) = I]n=i ^npVn with 



\ „ _ /— / 



where 7 is a constant between and 1. Then we have T[pp) = pp, and a straightforward 
calculation shows that T is strictly contractive with k = a/1 — 7. The constant 7 can be 
given a direct physical interpretation. Let us write 7 = 1 — e~^/^. Then for n > A we 
will have ||T"(p) — p^||^ < \\p — pp\\y □ 



3.2.3 Strictly contractive channels on <S(C^) 

Consider a channel T : iS(C") 5(C"). Because the trace class 7i(C"') is the linear span 
of S^C"'), and because we also have 7i(C") = i3(C") = M.n, the map T can be uniquely 
extended to all of Ain- We can naturally identify the space Ain of nxn complex matrices 
with C" . Thus any linear map of Ain to itself can be naturally regarded as an x 
complex matrix. 

Under this identification, it is possible to parametrize all completely positive maps on 
Ain (see, e.g., the article of Fujiwara and Algoet [43]), but the analysis turns out to be 
quite involved already in the case of At2 (cf King and Ruskai [65] or Ruskai, Szarek, 
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and Werner [114]). In this section we show how the contraction properties of a channel 
T : iS(C^) iS(C^) can be read off directly from its matrix representation. 

First of all, recall that any 2x2 complex matrix M can be written as a linear combi- 
nation of the identity matrix and the Pauli matrices 

^^=(l o)' ^^=(z 7)' ^^=(o -l)- 

In particular, if M is Hermitian, then the coefficients in this expansion will be real. The 
set {I/a/2, (Ti/v^, cr2/v^, (Ts/v^} forms an orthonormal basis of when the latter is 
viewed as a Hilbert space with the Hilbert-Schmidt inner product {A, B) := tr {A*B). We 
will refer to this basis as the Pauli basis. The upshot is that we can represent any matrix 

M = mol + mi(7i + m2a2 + mscrs 

as a vector in with the components mj, z = 0, 1, 2, 3. Furthermore, we have niQ = tr M/2 
and rrii = tr {Mai/2). 

Now it is easy to show that any density matrix p G iS(C^) can be written as 

1 1 
p = -(I + riCTi + r2cr2 + rgcig) = -(I + r ■ a), 

where G M and + r| + r| < 1 with equality if and only if p is a pure state. Thus 
there is a one-to-one correspondence between the density matrices in Ad 2 and the points 
in the closed unit ball in M^. Under this identification, this ball is known as the Bloch- 
Poincare ball. Given p G iS(C^), we will refer to the corresponding vector r G M'^ as the 
Bloch-Poincare vector of p. 

Our characterization of strictly contractive channels on i5(C^) hinges on the following 
important theorem [65]. 

Theorem 3.2.7 (King-Ruskai) LetT be a channel ciniS(C^). Then there exist unitaries 
U, V and vectors t, v G such that 

T{p) = UT,^^{VpV*)U\ (3.5) 

where the channel Tt.v is defined by 

Tt,v WqI + WiCi = WqI + Y^i^oti + ViWi)ai. (3.6) 

\ i=l I i=\ 



Proof: First observe that, with respect to the Pauli basis, any trace-preserving map 
T : M.2 ^ M.2 can be written in block form as 








u 


T 



(3.7) 



where u G C'^, and T is a 3 x 3 complex matrix. Furthermore, if T is a positive map, 
then u G M"^ and T is a real matrix; this follows from the fact that any positive map on a 
C*-algebra A leaves invariant the set of self-adjoint elements of A [130]. 
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Any nxn matrix A can be written in the form A = VDW*, where V and W are unitary, 
and D is a positive semidefinite diagonal matrix. This is referred to as the singular value 
decomposition of A [12]. U A is real, the matrices V and W can be chosen real orthogonal. 
Write down the singular value decomposition T = VDW^, where V,W & 0{2). Any 
matrix in 0(2) is a rotation (modulo sign), so we can write 

T = RiDR^ ^ 

where -Ri and R2 are rotations, and the sign has been absorbed into the matrix D. We 
can then decompose T as follows: 



T 











Ri 








RJm 


D 











R2 



(3. 



Now put t := -R^u, and let v be the vector whose components are the diagonal entries 
of D. The middle matrix on the right-hand side of Eq. (3.8) is precisely the matrix 
representation, with respect to the Pauli basis, of the map T^ t defined in Eq. (3.6), while 
the first and last matrices correspond to unitary conjugations in M.2. This proves the 
theorem. ■ 



Using Theorem 3.2.7, we can read off the contraction properties of T from the channel 
Tv^t (the unitary conjugations U ■ U* and V ■ V* are irrelevant for this purpose because of 
unitary invariance of the trace norm). Thus consider two density operators p, p' with the 
Bloch-Poincare vectors r, r'. Letting A := r — r', we have 



\\np)-np')\\. 



|Tv,t(p-p')|li 



i=l 



< - fmax \vi\\ 



i=l 



^max \vi\ ] IIP - f> . 



(3.9) 



Clearly, the upper bound in Eq. (3.9) is achieved whenever the only nonzero component of 
A corresponds to the direction in which is largest. Hence, if T is strictly contractive, 
we have k = maxj \vi\. Writing T in the form (3.7), we see that k is also the largest 



singular value of the matrix T, i.e., k 



T 



Given two density operators p, p' G let r and r' be their Bloch-Poincare vectors. 
Then we have the following useful observation. The trace-norm distance ||p 
expressed geometrically in terms of the Euclidean distance between r and r' as 



p'll^ can be 



IP-P 111 



Af + Ai + Ai, 



where A is defined as before. The proof is easy once we note that the matrix p — p' has 

'''^^ can be 



eigenvalues ±(1/2) wAf 



A2 
^2 



A3. Thus the action of any channel T on 5(C^ 
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visualized, modulo a rotation and a translation, as the shrinking of the Bloch-Poincare 
ball into an ellipsoid, and the contractivity modulus of T is precisely the half-length of 
the longest symmetry axis of this ellipsoid. In this sense, the term "strictly contractive" 
becomes especially apt. For instance, the action of the depolarizing channel Dp on S{£?) 
is tantamount to the rescaling of the Bloch-Poincare ball by the factor of 1 — p. 

Tensor products of strictly contractive channels do not lend themselves as easily to 
an intuitive geometric interpretation, apart from some special cases. In particular, when 
T and T' are bistochastic strictly contractive channels on 5(C^), with the respective 
contractivity moduli k and fc', it can be shown that the product channel T ® T' is also 
strictly contractive with the contractivity modulus max{/c, k'}. To see this, we first note 
that any density matrix in can be written as [87, Ch. 2] 



P 




HO. 



(To 



where the vectors r, s G M'^ are referred to as the coherence vectors of the first and second 
qubit respectively, and the 3x3 real matrix 6 is called the correlation tensor of p. Hence 
each density operator p e M.^ can be uniquely described by the ordered triple (r, s, 6), 
so we will write p ~ (r,s, 6). The contraction properties of the channel T ® T' can be 
read off from the corresponding channel T^^ ® T^iy. Consider two density operators 



p ~ (r, s, 6) and p 
then have 



[r', s'. 



6') and define T 



s - s', and S := e - 6'. We 



\\{T 



< 



T')ip-p')\\i 
||(Tt,v®Tt,vO(p-p')lli 



333 
X ViTi (Ti (g) I + X v[Ai I (g) (Ti + X ViVjEij ai ® a,j 



i=l 



i=l 



1 V, 

max 



F3| 



4"""'" [ \v[\,...,\v'3\ 
= max{fc, /c'} Hp — p'll-,^ , 



333 
X Ti (Ti (g) I + X Aj I (g) (Ti + Eii (Ji (X) a,- 



i=l 



(3.10) 



where we have used the fact that , < 1 for all i because T and T' are strictly 
contractive. Again, the bound (3.10) can be achieved by choosing p and p' suitably, so 
we conclude that T (g T' is strictly contractive, and that its contractivity modulus equals 
the greater of k and k' . 



3.2.4 The density theorem for strictly contractive channels 

Suppose we are presented with some quantum system in an unknown state p, and we 
are trying to determine this state. Any physically realizable apparatus will have finite 
resolution e, so that all states p' with ||p — p'\\^ < e are considered indistinguishable from 
p. Now, if ^ is the Hilbert space associated with the system, and if S is a dense subset 
of S{J^), then, by definition of a dense subset, for any e > and any p G S{J^), there 
will always be some cr G S such that ||p — cr||^ < e. 
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The same reasoning also applies to distinguishability of quantum channels, except now 
the appropriate measure of closeness is furnished by the cb-norm. Thus, if an experiment 
utilizes some apparatus with resolution e, then any two channels T, S with ||T — S\\^y^ < e 
are considered indistinguishable from each other. There is, however, no fundamental 
difference between distinguishability of states and distinguishability of channels because 
any experiment purporting to distinguish between two given channels T and S consists 
in preparing the apparatus in some state p and then making some measurements that 
would tell the states T[p) and S{p) apart from each other. Then, since for any state p, 
||T(p) — 5(^)11-^ < ||T — the resolving power of the apparatus that will distinguish 
between T and S is limited by the resolving power of the apparatus that will distinguish 
between T(p) and S{p). 

The main result of this section is summarized in the following theorem [101]. 

Theorem 3.2.8 Let C{Jif) be the set of all channels on S{J^). Then the set Csc(^) of 
all strictly contractive channels on S{J^) is a \\-\\^^-dense convex subset of C{J^). 

Proof: We show convexity first. Suppose Ti,T2 G Csc(^). Define the channel S := 
ATi + (1 — A)T2, < A < 1. Then, for any p,(J E S{J^), we have the estimate 

\\Sip)-Sia)\\, < X\\T^{p)-T,{a)\\, + {l~X)\\T,{p)-T,{a% 
< [AA;(Ti) + (l-A)A;(r2)]||p-a||,, 

where k{Ti) is the contractivity modulus of Tj, i G {1,2}. Defining k : = 
max {/c(Ti), A;(T2)}, we get 



\\Sip)-Sia)\\,<k\\p 



0" 



1 • 



Since Ti,T2 are strictly contractive, k < 1, and therefore S G Csc{^)- To prove density, 
let us fix some a G S{J^). Given e > 0, pick some positive n such that 1/n < e. For any 
T G C(^), define 

T„ := T. 



In \ 2nJ 
Clearly, Tn G Cgd^), and the estimate 

\\T-Tn\U = ^\\T-K4^^<-<e 

finishes the proof. 



This theorem indicates that, as far as physically realizable (finite-precision) measure- 
ments go, there is no way to distinguish a given channel T from some strictly contractive 
T' with ||T — T'll^j^ < e, where e is the resolution of the measuring apparatus. We can also 
consider the channel fidelity as the measure of distinguishability, in which case we have 
the following corollary. 

Corollary 3.2.9 For any channel T : S{M') S{Jif) and any e > 0, there exists a 
strictly contractive channel T' : S{Jif) S{J^) such that J^{T,T') > 1 — e. 
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Proof: Given e, Theorem 3.2.^ 
with ||T - T'W^^ < 2(1 - v^r^) 



says that there exists a strictly contractive channel T' 
Then Eq. (2.53) implies that J^(T, T) > 1 - e. ■ 



We also mention that any channel T with ||T — id||^j^ < e (for some sufficiently small 
e > 0) cannot be distinguished from a depolarizing channel. Indeed, it suffices to pick 
some 



n > 



K 



l/d 



id 



cb 



6-||T-id|| 



cb 



where d = dim J^, so that 



T-D^/^ ||r-idL, + (l/n) 



K 



l/d 



id 



cb 



< e. 



We note that the channel formed by taking a convex combination of any channel with 
a strictly contractive channel is a strictly contractive channel. Let T G C be an arbitrary 
channel, and suppose that T' G Csc [from now on, we will not mention the Hilbert space 

when talking about channels on S{J^), unless this omission might cause ambiguity]. 
Define, for some < A < 1, the channel S := AT + (1 — A)T'. Then 



\\Sip) - Sia)\\, 



< X\\T{p)-T{a)\\, + {l-X)\\r{p)-T'{a) 

< [X + (l-X)kiT')]\\p-a\\,. 



Since A + (1 — X)k(T') < 1, we conclude that S G Csc. 

Finally, we mention that a method similar to that in the proof of Theorem 3.2.8 can 
be used to show that the set C^^ of all bistochastic strictly contractive channels is a dense 
convex subset of the set of all bistochastic channels. 



3.3 Strictly contractive dynamics of quantum regis- 
ters and computers 

So far we have established two important properties of strictly contractive channels. 
Firstly, any channel T can be approximated arbitrarily closely by a strictly contractive 
channel T', i.e., for any e > 0, we can find a strictly contractive channel T' such that 
JF(T, T') > 1 — e. Secondly, any quantum decision strategy that would, in principle, dis- 
tinguish some pair p, p' of density operators with certainty, will fail with probability at 
least [1 — k{T)]/2 in the presence of a strictly contractive error channel T. The latter 
statement can also be phrased as follows: no two density operators in the image TS{J€') 
of S{J^) under some T G Csc have orthogonal supports; furthermore, the trace- norm 
distance between any two density operators in TS{M') is bounded from above by 2k. 

In this section we obtain dimension-independent estimates on decoherence rates of 
quantum memories and computers under the infiuence of strictly contractive noise and 
without any error correction (the possibility of error correction will be addressed in the 
next two sections). 

We treat quantum memories (registers) first. Suppose that we want to store a state 
Po £ S{,^) for time t in the presence of errors modeled by some strictly contractive 
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channel T. Let r be the decoherence timescale, with t <^t, and let n = It/r]. The final 
state of the register is then p„ = T"'(po)- If pT is the unique fixed point of T, then 

||p„ - prill = ||T"(po) - r"(pr)||i < k{Tr IIpo - prill . 

In other words, the state Pq, stored in a quantum register in the presence of strictly 
contractive noise T, evolves to the unique T-invariant state px, and the convergence is 
incredibly rapid. For the sake of concreteness let us consider a numerical example. Suppose 
that k{T) = 0.9, and that initially the states po and pr have orthogonal supports, so 
IIpo — PtIIi = 2. Then, after n = 10 iterations (i.e., t = lOr), we have ||p„ — prWi < 0.697, 
and the probability of correct discrimination between p„ and px is only 0.674. Note that 
the decoherence rate estimate 

IIpo - PtIIi 

does not depend on the dimension of J^, but only on the contractivity modulus k{T) 
and on the relative storage duration n. In other words, quantum registers of any size are 
equally sensitive to strictly contractive errors with the same contractivity modulus. 

Obtaining estimates on decoherence rates of computers is not so simple because, in 
general, the sequence {pn}, where p„ is the overall state of the computer after n compu- 
tational steps, does not have to be convergent. Let us first fix the model of a quantum 
computer. We define an ideal quantum circuit of size n to be an ordered ra-tuple of uni- 
taries Ui, where each Ui is a tensor product of elements of some set Q of universal gates 
[8] , which must be a dense subgroup of the group U {J^) of all unitary operators on J^. 
For some error channel T, a T -noisy quantum circuit of size n with r error locations is an 
ordered (n + r)-tuple containing n channels Ui := Ui ■ U*, where the unitaries Ui are of the 
form described above, as well as r instances of T. We will assume, for simplicity, that each 
T is preceded and followed by some Lfi and Ui+i- Based on this definition, the "noisiest" 
computer for fixed T and n is represented by a T-noisy quantum circuit of size n with n 
error locations, i.e., by a 2n-tuple of the form (f/i, T, U2, T, . . . , Un, T). If the initial state 
of the computer is po, then we will use the notation 

Pn={j{Tu}i (po) (3.11) 

to signify the state of the computer after n computational steps. In the above expression, 
the product sign should be understood in the sense of composition T oU^o . . . oT oUi. 

Given an arbitrary sequence of computational steps, the sequence {pn}, defined by 
Eq. (3.11) (assuming that n is suficiently large, i.e., the computation is sufficiently long) 
need not be convergent. However, if the channel T is strictly contractive, then for any 
e > there will exist some A^o such that, for any pair of initial states po,Po G S{Ji^), the 
states pni p'ni ^ will be indistinguishable from each other. In other words, any two 
sufficiently lengthy computations will yield nearly the same final state. 

Using Eq. (3.11), as well as unitary invariance of the trace norm, we obtain 



IPn -Pnl 



\i=l 



< k{Tr \\po-p'j,. 
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Now suppose that at the end of the computation we perform a measurement with precision 
e, i.e., any two states p, p' with ||p — p'||^ < e are considered indistinguishable. Then, if the 
computation takes at least A^'o = [log(e/2)/ log A;(T)] steps, we will have \\pn — Pnlli < e 
for all n > Nq. For a numerical illustration, we take k{T) = 0.9 and e = 0.01, which yields 
A'o = 50. In other words, the result of any computation that takes more than 50 steps 
in the presence of a strictly contractive channel T with k{T) = 0.9 is untrustworthy since 
we will not be able to distinguish between any two states p and p' with ||p — p'||]^ < 0.01. 
Again, Nq depends only on the contractivity modulus of T and on the measurement 
precision e, not on the dimension of Jif, at least not explicitly. We note that, if the 
state of the computer is a density operator over a 2*-dimensional Hilbert space, then any 
efficient quantum computation will take 0(Poly(s)) steps, and therefore the sensitivity of 
the computer's algorithm to errors grows exponentially with s. 

There are, however, some cases when the sequence {pn} does converge. Suppose first 
that the channel T G Csc is bistochastic. Then, since each channel f/j is bistochastic as 
well, the sequence {pn} converges exponentially fast to the maximally mixed state 1/d, 
where d = dimJif. Also, if the computation employs a static algorithm, i.e., Ui = U for 
all i (this is true, e.g., in the case of Grover's search algorithm [55]), then the channel 
S := TU is also strictly contractive, and k{S) = k(T) by unitary invariance of the trace 
norm. Denoting the fixed point of S* by ps, we then have 

\\Pn - PsL = ||5"(Po) - S-{ps)\\, < k{Tr IIpo - Psh . 

i.e., the output state of any sufficiently lengthy computation with a static algorithm will 
be indistinguishable from the fixed point ps of S = TU. 

3.4 Error correction and strictly contractive channels 

After we have seen that quantum memories and computers are ultrasensitive to errors 
modeled by strictly contractive channels, we must address the issue of error correction 
(stabilization of quantum information). Since we have not made any specific assump- 
tions (beyond strict contractivity) about the errors affecting the computer, it is especially 
important to investigate the possibility of error correction, if only to determine the limita- 
tions on the robustness of physically realizable quantum computers from the foundational 
standpoint. 

3.4.1 The basics of quantum error correction 

The simplest scheme for protecting quantum information is a straightforward adaptation 
of a classical error-correcting code [86]. The basic object in the construction of a quantum 
error-correcting code is defined as follows. 

Definition 3.4.1 An {n,k) quantum code is an isometry V : J(f from the 1^- 

dimensional Hilbert space to the l!^ -dimensional subspace (the code^ of a Tri- 
dimensional coding space . 
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In other words, an input state p G iS(J^) gives rise to the encoded state V pV* G S{,y(f). 
The encoded state is acted upon by some known channel T, which models the errors. 
Then we say that \^ is a T -correcting code if and only if there exists the recovery channel 
R such that, for any p G we have 

{RoT){ypV*) = p. (3.12) 

The channel R oT o (V ■ V*) can be viewed as the composition of the encoding step, the 
noisy channel, and the decoding step. The isometry V can be eliminated from Eq. (3.12) 
by writing it as 

{RoT){p) = p ypeS{J^). (3.13) 

We then say that is a T-correcting code if and only if Eq. (3.13) holds for some channel 
R. 

The following theorem, due to Knill and Laflamme [6S], is a tool for determining 
whether a given subspace of the coding space can serve as a T-correcting code. 



Theorem 3.4.2 (Knill-Lafiamme) Let Jif be a Hilbert space, and consider a channel 
T : S{J^) S{J^). Let {Va} be a Kraus decomposition ofT. Then a subspace ^ of ^ 
is a T-correcting code if and only if, for all ip,(f> E J^, 

(^i(k:w) = c«/3(^I0), 

where Cap is some constant that depends only on Va and Vp. 

We do not give the proof of Theorem 3.4.2 because it is not important for our purposes; 
the interested reader can consult either the original proof of Knill and Laflamme [68] , or 
an alternative argument due to Nielsen et al. [90]. Knill and Laflamme have also given 
another criterion [68] in terms of maximally entangled pure states on J(f ® J(f . Please 
note that our proof differs from the original argument in that it relies on the concept of 
the channel fidelity (cf. Section 2.4.2). 



Theorem 3.4.3 (Knill-Lafiamme) Let Jif be a Hilbert space, and consider a channel 
T : S{Jif) S{J^). Then a subspace of Jif is a T-correcting code if and only if there 
exists a channel R : S{J^) — > S{J^) such that, for any orthonormal basis {cj} of J^, 

{(RoT) ® id) K]|ei ® ei){ej ® ej| = ^je^ ® ei){ej ® ej\. (3.14) 



Proof: We note that Eq. (3.14) is equivalent to the statement that 

jF(i? o T| j^ , idB(,jr)) = 1, where RoT\ j^ denotes the restriction of the channel 
R oT to S{J(f). However, this will hold if and only if i? o = idj3{jf)i which proves 
the theorem. ■ 
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The Knill-Lafiamme theory provides also for approximately correctable channels. That 
is, let {Va} be a Kraus decomposition of some channel T on S{Jif). For any subset A of 
{Va}, we can define the completely positive map Ta via 



Then a subspace of Jif can serve as a TA-correcting code if there exists some channel 
R on S{Jifc) such that, for all p G 



If ||T — TaII^.^ is sufficiently small, then it makes sense to say that the noisy channel T is 
approximately correctable. 

The method of quantum error-correcting codes is ill-suited for dealing with correlated 
errors. A more general approach to the stabilization of quantum information is described 
in the work of Knill, Laflamme, and Viola [69] and Zanardi [148], the essence of which 
we now summarize. Given some quantum system with the associated finite-dimensional 
Hilbert space J^, we consider the error channel T with the Kraus operators Va- We define 
the interaction algebra V of T as the *-algebra generated by the VaS (i.e., as the norm 
closure of the set of all polynomials in Va and their adjoints). It is obvious that V is an 
algebra with identity because of the condition J2a ^a^a = I- However, since the Kraus 
representation of a channel T is not unique, we must make sure that, for any two choices 
{Va} and {VFq} of Kraus decompositions of T, the corresponding interaction algebras are 
equal. Using the fact that any two Kraus decompositions of a channel are connected via 



where Vap are the entries of a matrix V with V*V = I, we see that it is indeed the case 
that the interaction algebra of a channel T does not depend on the particular choice of 
the Kraus operators. 

The existence of noiseless subsystems with respect to T hinges on the reducibility of 
the interaction algebra V. Since V is, by definition, a uniformly closed *-subalgebra of 
B{J^), it is a finite-dimensional C*-algebra. A basic result from representation theory 
[1] tells us that V is isomorphic to a direct sum of r full matrix algebras, each of which 
appears with multiplicity rrii and has dimension (i.e., it is an algebra of rij x rij complex 
matrices). Thus dimV = Z]i=i^?- The commutant V' of V is defined as the set of all 
operators X G B{Jif) that commute with all V From the Wedderburn theorem [151, 
p. 61] it follows that each V G V has the form 



Tj,{X) := ^ VaXV:, VX G B{J^). 



(i?oT)(p)ocp. 



Va = Y.^c.l3Wp, 



r 



^ = I™. ® V, 



VeM 



(3.15) 



i=l 



and that each V G V has the form 



r 



V' = ^ VI ® I 



Vl G M 



(3.16) 



i=l 
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Thus dimV = YJi=i^'i- We have the corresponding isomorphism 

~ C""' ® C"% (3.17) 

2=1 

and each factor C™' is referred to as a noiseless subsystem because it is effectively decoupled 
from the error channel T. It is rather obvious that, in order to be of any use, a noiseless 
subsystem must be nontrivial, i.e., at least two-dimensional. Now, if the interaction algebra 
V is irreducible, then dimV = 1, and no noiseless subsystems exist. There is a simple 
necessary and sufficient condition for irreducibility of an algebra, Schur's lemma p. 
47], which states that a *-algebra A is irreducible if and only if its commutant A' consists 
of complex multiples of the identity. 

3.4.2 Impossibility of perfect error correction 

We now draw our attention to the correctability of errors modeled by strictly contractive 
channels. Consider a strictly contractive channel T : S{J^) — > S{J^). It is easy to 
see that there does not exist a subspace of ^ that could serve as a T-correcting code. 
Indeed, suppose to the contrary that is such a subspace. Then there exists a channel 
R : S{J^) — i> S{Jif) such that Eq. (3.13) holds. However, because T is strictly contractive, 
the channel Ro T is also strictly contractive with k{R o T) < k{T). Therefore, for any 
p, p' G S{J^) we have 

\\{RoT)ip-p')\\,<kiT)\\p-p'\\,. 
On the other hand, Eq. (3.13) implies that 

\\{RoT){p-p')\\, = \\p-pX, 

which would be true only for k{T) > 1. This is a contradiction, so we see that no strictly 
contractive channel T admits a perfect quantum error-correcting code. On the other 
hand, keeping in mind the fact that the Knill-Laflamme theory allows for approximate 
correctability of errors, we can conclude that the nonexistence of perfect error-correcting 
codes is not likely to be a serious problem. 

It turns out, however, that the property of strict contractivity is so strong that no 
strictly contractive channel admits noiseless subsystems. As a warm-up, let us prove the 
special case when the channel in question is also bistochastic [104]. 

Theorem 3.4.4 Consider a bistochastic strictly contractive channelT : S{J^) S{Jif), 
where is a finite- dimensional Hilbert space. Then the interaction algebra V of T is 
irreducible, i.e., T admits no noiseless subsystems. 

Proof: We first observe that any operator X that belongs to the commutant of V must 
necessarily be a fixed point of T in B{Jif). Indeed, if X G V, then 

T{x) = ^ v^xv: = xY, v^v: = X, 

a a 

where we used the fact that ^aV* = I for a bistochastic channel. Now if X G V, 
then we also have X* G V. This implies that, for any X G -B(^), Xi := {X + X*)/2 
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and X2 := (X — X*)/2i are in V' whenever X is. We can therefore restrict ourselves to 
self-adjoint operators in the commutant of V. 

For any self-adjoint operator X, the operator \X\ = (X*X)^/^ belongs to the algebra 
generated by [16, p. 34], whence 

X = X* eV ^ x±:= e V. 

Since X = X^ — X_ and X± > 0, we reduce our task to showing that any positive X 
in the commutant of V is a multiple of the identity. Without loss of generality we may 
assume that = 1, which, together with the positivity of X, implies that X is a 

density operator. But the only density operator left invariant by T is the maximally 
mixed state 1/ dimJ^, so we conclude that V' = CI, i.e., V is irreducible by Schur's 
lemma. ■ 



Remark: Incidentally, one can use Theorem 3.4.4 to show that the multiples of identity 
are the only operators in B{^) that are left invariant by the Heisenberg-picture channel 
T. This is a consequence of a theorem of Fannes, Nachtergaele, and Werner [15, 40], which 
in the finite-dimensional case states that if there exists an invertible density operator left 
invariant by T, then the fixed-point set of T is precisely the commutant of the interaction 
algebra of T. □ 

We now prove the general case [104]. Whereas the proof of Theorem 3.4.4 relied only 
on the existence and uniqueness of the T-invariant state, the proof below directly exploits 
the property of strict contractivity. 

Theorem 3.4.5 Consider a strictly contractive channel T : S{M') S{M'), where 
is a finite- dimensional Hilbert space. Then the interaction algebra V of T is irreducible, 
i.e., T admits no noiseless subsystems. 

Proof: Let V be the interaction algebra of the channel T. Let us suppose, contrary 
to the statement of the theorem, that T admits at least one noiseless subsystem (i.e., V 
is reducible). That is, there exists at least one j G {l,...,r} such that rnj,nj > 2 in 
Eqs. (3.15)-(3.17). Let be some closed subspace of Jif. Restricting the channel T to 
the set 

5(jr) := {p e 5(^)|suppp C ^} 

(where supp p is the orthogonal complement of kerp), we note that, by definition, the 
contractivity modulus of the restricted channel cannot exceed the contractivity modulus 
of T. Let Jifj be the jth direct summand C™^ (E) C"^ in Eq. (3.17). Define the channel Tj 
as the restriction of T to S{Jifj). Then any Kraus operator of Tj has the form l^j ® 
where & and 

Furthermore k{Tj) < k{T) < 1. Now Tj is the channel of the form id ® Sj, where 5*^ is 
the channel on 5(C'^-') with Kraus operators V"^. As can be easily seen, channels of this 
form are not strictly contractive (they have infinitely many fixed points). Thus k{Tj) = 1, 
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and the theorem is proved, reductio ad ahsurdum. 



The statement of Theorem 3.4.5 is quite shocking as it unequivocally rules out the 
existence of noiseless subsystems for any strictly contractive channel. From the standpoint 
of foundations of quantum theory, the importance of Theorem 3.4.5 lies in the fact that 
it establishes nonexistence of noiseless subsystems for a wide class of physically realizable 
quantum computers on the basis of a minimal set of assumptions. Furthermore, from the 
mathematical point of view, it is rather remarkable that strict contractivity of a channel 
already implies irreducibility of its interaction algebra. We must, however, hasten to 
emphasize that, despite its sweeping generality. Theorem 3.4.5 should not be considered 
as a proof of impossibility of building a reliable quantum computer. It merely rules out the 
possibility of building quantum computers with perfect protection against errors modeled 
by strictly contractive channels. 

3.4.3 Approximate error correction 

At this point we must realize that the results of the previous section are not as unexpected 
as they may seem. After all, nothing is perfect in the real world! Therefore, our error 
correction schemes must, at best, come as close as possible to the perfect scenario. Of 
course, the precise criteria for determining how close a given error correction scheme is 
to the "perfect case" will vary depending on the particular situation, but we can state 
perhaps the most obvious criterion in terms of distinguishability of channels. 

Let us first phrase everything in abstract terms. Let the error mechanism affecting the 
computer be modeled by some channel T. We assume that there exists some positive 5 <\ 
which, in some way, characterizes the channel T (it could be given, e.g., by the minimum 
of the operator norms of the Kraus operators of T, and thus quantify the "smallest" prob- 
ability of an error occurring). Let be the Hilbert space associated with the computer. 
Then, for each e > 0, we define an [e, 5) -approximate error- correcting scheme for T to 
consist of the following objects: 



(1) 


an integer n > 1, 




(2) 


a Hilbert space e^xt 


with dim^^xt > dim Jif, 


(3) 


a channel E : S{,^) 




(4) 


a channel T : 5(^xt 


) 5(^xt), and 


(5) 


a completely positive 


(CP) map Tcorr : 5(^xt) ^ 5(J^xt 



such that the channel T depends uniquely on n, =^xt, T, and E; the CP map Tcorr is 
correctable (say, in the Knill-Laflamme sense, or through other means, depending on the 
particular situation); and we have the estimate 



T-T, 



corr 



cb 



< 5" < e. 
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Let us give a concrete example in order to illustrate the above definition. Suppose that 
the channel T is of the form id + S* with \\S\\^y^ < d. Then, for any n, we can write 

n 

T®" = id+ J2 (8) '^'^^''^ + (3-19) 

AC{l,...,n) k = l 
0<\A\<n 

where \A\ denotes the cardinality of the set A, and la '■ {i, ■■■ ,n} {0,1} is the indicator 
function of A. We use the convention that, for any map M, M° = id. In other words, the 
summation on the right-hand side of Eq. (3.19) consists of tensor product terms with one 
or more identity factors. For the last term, we have < 5". 

In this case, given some e > 0, we pick such n that 5" < e and let J^e^t '■= If 
the CP map given by the sum of the first two terms on the right-hand side of Eq. (3.19) is 
correctable on some subspace of Jifext, then the channel E is defined in a natural way 
through the composition of the following two operations: (a) adjoining additional n — 1 
copies of Jf in some suitable state po, and (b) restricting to the subspace J^. This way, 
we obviously have T := T®"' and 

n 

AC{l,...,n} k = l 
0<\A\<n 

The estimate (3.18) holds because T — Tcorr = 5'®'^. We note that this construction results 
in a quantum error-correcting code that corrects any n — 1 errors. We can use similar 
reasoning to describe quantum codes that correct k < n errors. 

Constructing Jifext as a tensor product of a number of copies of Jif, the Hilbert space of 
the computer, evidently leads to the usual schemes for fault-tolerant quantum computation 
[102]. Other solutions, such as embedding the finite-dimensional Hilbert space Jif in a 
suitable infinite-dimensional Hilbert space (e.g., encoding a qubit in a harmonic oscillator 
[52]), can also be formulated in a manner consistent with our definition above. 

Let us now address approximate correct ability of strictly contractive errors. We have 
previously demonstrated that, in the absence of error correction, the sensitivity of quantum 
memories and computers to such errors grows exponentially with storage and computation 
time respectively. Let T be a strictly contractive error channel. It is obvious that the 
appropriate approximate error correction scheme must be such that the contraction rate of 
the "encoded" computer, where the errors are now modeled by the channel T, is effectively 
slowed down. In some cases, straightforward tensor-product realization may prove useful 
(e.g., when the product channel T ® T is not strictly contractive). We must recall that, 
for any channel S, a necessary condition for correctability is ^(5*) = 1. Thus, if we can 
find a suitable approximate error-correcting scheme where T would be well approximated 
by some channel T^orr with /c(Tcorr) = 1, we may effectively slow down the contraction rate 
by protecting the encoded computer against errors modeled by T^orr- A more ingenious 
approach may call for replacing circuit-based quantum computation with that in massively 
parallel arrays of interacting particles; several such implementations have already been 
proposed (see, e.g., [18]). It is quite likely that the possible "encodings" of quantum 
computation in these massively parallel systems may offer a more efficient implementation 
of approximate error correction. 
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Finally, we should mention that the idea of "approximate" noiseless subsystems has 
already been explored by Bacon, Lidar, and Whaley [7]. In their work, it is argued that 
the symmetry, which is required of a channel in order for noiseless subsystems to exist, is 
generally broken by perturbing the channel. They show that, if the perturbations of the 
channel are "reasonable," then the noiseless subsytem is stable to second order in time. 
We must reiterate that the negative results we have stated in the previous section refer 
only to nonexistence of "perfectly" noiseless subsystems; in the real world, we have no 
choice but to settle for "almost perfect" anyway. 

3.5 Implications for quantum information processing 
3.5.1 General considerations 

As we have seen in Section 3.3, the maximum number rimax of operations that can be carried 
out on a physically realizable quantum computer in the presence of strictly contractive 
noise is limited by the contraction rate k and the measurement precision e, and is equal to 
log (e/2)/ log A;. The measurement precision e depends on the measuring apparatus, while 
the contraction rate k is determined by the decoherence mechanism. In the next section 
we will present an elementary analysis of noisy bulk spin-resonance quantum computation 
[25, 48] in terms of the strictly contractive decoherence model; here we focus on the 
quantitative conclusions that can be drawn regardless of the type of "hardware" used for 
building the quantum computer. 

First of all, let us make an obvious observation that the number of operations that can 
be carried out within the "coherence time" of the computer is related not to the size of 
the corresponding quantum circuit (i.e., the total number of gates used to construct it), 
but rather to the depth of the circuit (i.e., the maximum number of gates acting on any 
qubit throughout the computation). It is quite clear that the complexity-theoretic circuit 
depth is irrelevant here; what matters is the physical circuit depth, which is, of course, 
determined by the particular realization of the computer. With that in mind, let DA{n) 
denote the physical circuit depth for some quantum algorithm A with the input state of 
n qubits. Then, if the contraction rate k is fixed, the required measurement precision is 
easily seen to be given by 

e = 2A;^^("). (3.20) 

Because the contraction rate can be written as 1/2", where a is some large positive number, 
we can rewrite Eq. (3.20) as 

1 

where c is a constant that depends on the decoherence mechanism and increases as the noise 
gets stronger. Thus we see that the required measurement precision grows exponentially 
with the physical circuit depth. 

Alternatively we can consider the case when we are given e and D^(n), and need to 
determine the maximum tolerable error rate. Then, whenever k > {e/2Y^^^^"'\ we will 
have nmax > DA{n). When the noise is sufficiently weak, we can approximate it with a 
depolarizing channel (cf. Section 3.2.4), in which case k = 1 — rj, and the depolarization 
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constant rj can be thought of as the error rate. If the computation is to be concluded within 
the coherence time, then the maximum allowable error rate is given by 1 — (e/2)^/^-'^'^"'\ 
whence we see that, in order to build fault-tolerant circuit-based quantum computers, we 
need high-precision measurements and shallow circuits. To make an (admittedly academic) 
illustration of this, we provide in the table below the values of the threshold error rate 
for algorithms with various physical circuit depths for the case when the measurement 
precision is on the order of ^/h, comparable to the so-called standard quantum limit (SQL) 
[14]. 



D{n) 


n (number of qubits) 


20 


40 


60 


80 


100 


log 77, 


~ 1 


0.999 


0.999 


0.998 


0.998 


n 


0.863 


0.630 


0.485 


0.392 


0.328 




4.96 X 10-3 


6.22 X 10"^ 


1.84 X 10-^ 


7.78 X 10-5 


3.98 X 10-5 




0.038 


3.80 X 10-5 


3.71 X 10-^ 


3.62 X 10-" 


3.53 X 10-1'* 



Despite the fact that the measurement precision we have assumed is ridiculously high 
(e ~ 10"*^), the maximum tolerable error rate is still prohibitively low for circuits of 
polynomial and superpolynomial depth, even when the number of qubits is quite modest. 
It is worth noticing, however, that the threshold error rate starts off very close to unity 
and rolls off fairly slowly when the quantum circuit has logarithmic or linear depth. We 
will come back to this point in Section 3.5.3, when we talk about parallehzation as a means 
of protecting the computer against noise. 

3.5.2 Case study: ensemble quantum computation using nuclear 
magnetic resonance 

Looking back to the formula rimax = log (e/2)/ log /c, we can pose the following question. 
Given a particular experimental scheme for realizing a quantum computer, what can we 
say about the measurement precision and about the noise strength (contraction rate)? In 
this section we carry out a simple analysis in order to answer this question for ensemble 
quantum computation using nuclear magnetic resonance, proposed in 1997 independently 
by Cory, Fahmy, and Havel [25], and by Gershenfeld and Chuang [In]. From now on we will 
use the term "NMR quantum computation" to refer to this scheme; a more descriptive, 
and also more cumbersome, term would be "high-temperature liquid-state NMR quantum 
computation." 

The basic idea behind NMR quantum computation is the following. An A^-spin NMR 
quantum computer operates on a sample solution containing a huge number of molecules 
(on the order of 10^^), each of which accommodates two-level nuclear spins. The sam- 
ple, which is placed in a strong unidirectional magnetic field, is subjected to a temporal 
sequence of radio-frequency pulses, and each molecule functions as an autonomous com- 
putational unit. The result of the computation, which is read off by means of the usual 
techniques of NMR spectroscopy [39], is the ensemble average of the computer outputs 
taken with respect to the state of all the molecules in the sample. Because NMR ex- 
periments are conducted at room temperature (~ 300 K), the initial state of the sample 
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is the thermal equilibrium state exp [—[3H)/Zp, where H is the Hamiltonian of a single 
molecule. 

On a more formal level, the inner workings of an NMR quantum computer can be 
described using the concept of an effective pure state, which is defined as follows [22]. 

Definition 3.5.1 Let M' he a Hilbert space. Consider a unit vector ip G Jif, a channel 
T : S{M') S{J^), and a set {Xj} of observables in B{Jif). Then the state p G S{Jif) 
is called an effective pure state for ip with respect to T and {Xi} if there exists another 
channel T' and a constant a such that, for each i, 

tT[r{p)X.,]=a{^\f{X,)^). (3.21) 

Here is a concrete example [22] to illustrate this abstract definition. Let T be a bis- 
tochastic channel, i.e., T(I) = I. Then, for any a G (0, 1), the state 

Pa:= {l-a)l/dimJ^ + a\ij){ij\ (3.22) 

is an effective pure state, with T' = T, for the pure state with respect to T and 

any set of traceless observables. Indeed, for any X with trX = 0, we have 

tr [T{p^)X] = i_^tr[T(I)X] +a(^A|T(X)^) = a(7/;|f (X)^), 

so that the condition (3.21) is satisfied. 

The significance of the above formalism for NMR quantum computation comes from 
the fact that the Gibbs state of the liquid sample is well approximated by the state of the 
form (3.22) with dim Jif = 2^ and a = NhQl3/2^~^^ , where HQ is the average difference 
between the excited-state and the ground-state energies of the nuclear spins in a strong 
magnetic field [18, 117]. The quantity Ml(3/2 is referred to as the Boltzmann factor [22]. 
To give a feel for the orders of magnitude involved, the average resonant frequency of a 
nuclear spin in a typical NMR experiment is on the order of 200 MHz [18], which at room 
temperature corresponds to a ~ 1.6 x l{]~^N/2^ . One crucial feature of an effective pure 
state of the form (3.22) is that, for any bistochastic channel T, we have 

T(p,) = (l-«)2-^I + ar(|V;)(7M), 

i.e., the polarized spins that participate in the actual computation evolve independently of 
the unpolarized spins forming the "thermal background" in the sample. Then, provided 
that a suitable set of traceless observables is measured at the end of the computation, the 
only detectable signal comes from the pure portion of pa, the obvious disadvantage being 
that the corresponding signal strength is 0{N/2^), which decreases rapidly as the number 
of spins per molecule increases. 

An NMR quantum computer is usually run many times, and each time a single-spin 
observable is measured; the measurement results are then processed on a classical computer 
[22]. Let a", , cxg denote the Pauli spin matrices acting on the Hilbert space of the nth 
spin. A typical observable measured on the nth spin after a single run of the computer 
is equal, up to a multiplicative constant, to M„ := cr" + ia'2, so that the experimentally 
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detected output is proportional to the transverse magnetization of the sample, A^^^tr (pM„), 
where Ns is the number of molecules in the sample and p is the state of the sample after 
the computation [18]. Using this information, we can give a concrete interpretation of the 
measurement precision e. Consider the measurement of an arbitrary observable A. For 
any two density operators p and p', we have the bound 

|tr (pA) - tr {p'A)\ < \\A\\ \\p - p'\\^ . (3.23) 

Now let Aa a typical (e.g., r.m.s.) fluctuation of A. If we stipulate that the resolution of the 
measurement of A is limited by A^, then Eq. (3.23) suggests that any two density operators 
p, p' with Hp — p'W^ < Aa/ \\A\\ can be considered indistinguishable. When we are talking 
about the number Ns of molecules in a macroscopic sample, the corresponding fluctuation 
is given by ^/l^ [71, Ch. 1]. Therefore, because ||Mn|| = 2, we have e = l/2\^ Ns, which 
yields, for A^^ ~ 10^^, the value e ~ 10"^^. This gives us a rough (order-of-magnitude) 
estimate of the measurement precision in NMR quantum computers. 

As far as the decoherence mechanism is concerned, we need only consider single-spin 
dynamics because, after each run of the computer, only the single-spin observables are 
measured. There are two main sources of decoherence [48, 140], namely the thermal 
relaxation and the phase damping; they are described explicitly as follows [1 10] . The 
channel that models thermal relaxation is precisely the channel shown in Example 3.2.6; 
this channel is strictly contractive with k = e"^/^"^*'^, where r is the duration of the 
single step of the noisy dynamics, and Tth is the thermal relaxation time. The phase 
damping is modeled by the channel with the Kraus operators and \/l — Xa^, where 
A = (1 + e~^/"^p'")/2, Tph being the phase damping time. The phase-damping channel is 
not strictly contractive because it has two flxed points, and | ■?/'_)('?/'_ |, where 

a3ip± = ±ip±. 

Typically the phase damping time is much shorter than the thermal relaxation time 
[48], the value of the ratio Tph/Tth depending on the kinetics of the particular molecule. 
Therefore the phase damping time sets a more stringent limitation on the number of 
operations that can be carried out on an NMR quantum computer, but, because the 
channel formed by composing the thermalizing channel and the phase-damping channel 
is still strictly contractive with k = e""^/^"^"', only the thermal relaxation time is relevant 
to our analysis. With this in mind, we can write down the following expression for the 
maximum number of operations that can be carried out within the thermal relaxation 
time: 

_ log(e/2) log (10-^72) 
logfc ioge ^/^-'th 

The duration r of the single step of the noisy dynamics is comparable to the time it 
takes to execute a single unitary operation [48, 140]. A single-spin unitary operation can 
be performed in about 10 ms, whereas it may take roughly 100 ms to apply a two-spin 
gate. A conservative estimate for r would therefore be around 45 ms, whence we obtain 
'^max = 1258.85Tth, where the thermal relaxation time Tth is measured in seconds. 

Recently Vandersypen et al. [1 10] implemented the simplest nontrivial instance of 
Shor's quantum factoring algorithm (namely, flnding the prime factorization of 15) using 
a 7-spin NMR computer, which required ~ 300 computational steps. In the molecule they 
used, the thermal relaxation times of the spins were as small as 2.8 seconds and as large as 
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45.4 seconds, which implies that the maximum number of operations that could be carried 
out using a molecule with spin relaxation times in this range is anywhere from 3,525 
to 57,152. Because Shor's algorithm has cubic complexity, we may infer that the NMR 
quantum computer utilizing such molecules would not be scalable beyond 39 spins. In 
general, the scalability would increase with Tth, so one of the ways to meet the scalability 
challenge would be to engineer molecules with high thermal relaxation times. We mention 
in passing that, as pointed out by Schack and Caves [] 17], there exists a purely classical 
model for NMR quantum computation when the number of the spins per molecule is 
sufficiently low (e.g., when the Boltzmann factor equals 2 x 10~^, the NMR computer with 
< 16 admits a classical model). 

3.5.3 Where do we go from here? 

The main lesson to be learned from the strictly contractive model of decoherence is the 
following: the longer the computation, the less reliable its output. The same problem arises 
in classical circuit-based computation with noisy gates, and there are two ways to handle it: 
(a) error-correcting codes, and (b) parallelization. The first of these techniques amounts 
to introducing a considerable amount of redundancy into the network. In particular, 
it was shown by Dobrushin and Ortyukov [34] (cf. also the more refined argument by 
Pippenger, Stamoulis, and Tsitsiklis [101]) that, if a noiseless network requires gates 
to compute a particular function, then the noisy network would require at least log A^ 
gates to compute the same function reliably, provided that the error probability per gate 
does not exceed 1/2. The parallelization technique, on the other hand, allows to reduce 
the computation time by shrinking the circuit depth. 

The problems that are efficiently parallelizable (i.e., can be computed in polylogarith- 
mic time using a polynomial number of processors working in parallel) form the complexity 
class NC [96, Ch. 15]. The abbreviation NC stands for "Nick's class," after Nicholas Pip- 
penger who extensively studied this complexity class. Typical problems in NC are, e.g., 
summing m numbers [which can be done in time O(logm)], or copying the contents of a 
particular memory cell into n'-^^^^ memory cells [which can be done in time O(logn)] [5, p. 
253 ff.]. 

Moore and Nilsson [S9] recently introduced the quantum complexity class QNC. They 
showed that most circuits, including those for performing error correction, can be paral- 
lelized to logarithmic depth. A notable exception is the circuit for the quantum Fourier 
transform (QFT), a crucial ingredient in Shor's factoring algorithm, which can be par- 
allelized only to linear depth. Moore and Nilsson conjectured that it is impossible to 
parallelize the QFT circuit to less than linear depth. In Section 3.5.1 we provided some 
numerical estimates for the threshold error rate in circuit-based quantum computers as a 
function of the circuit depth. We saw that circuits of logarithmic and linear depth turned 
out to be more robust than circuits of polynomial and superpolynomial depth. Hence, if 
one does insist on implementing quantum computers using the circuit paradigm, then it 
may be worthwhile to explore the class QNC further. 

A more radical solution is to abandon the quantum circuits in favor of massively parallel 
systems of locally interacting particles (cellular automata). In a cellular automaton, the 
state of each particle at some integer time 1 is determined by its state, as well as by the 
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states of finitely many neighboring particles, at time t. Classical cellular automata, both 
deterministic [21, 47] and probabilistic [76, 80], model a rich variety of complex phenomena; 
in particular, they can serve as a computational medium. As was shown by Toom [137], it 
is possible to store reliably a single bit of information in a noisy two-dimensional cellular 
automaton. The approach of Toom was adopted by Gacs [15] and by Gacs and Reif 
[46], who have demonstrated that it is possible to perform reliable computation in nosiy 
three-dimensional cellular automata. The physical underpinning of reliable computation 
and information storage in noisy cellular automata can be understood in terms of phase 
transitions [76]. The idea is to construct a nonergodic cellular automaton, i.e., one that 
does not have a unique invariant state which it would eventually reach irrespective of 
initial conditions. 

It would be interesting to see how much of this carries over to the quantum domain. 
There are many proposals for computation, both classical and quantum, using quantum 
cellular automata (see, e.g., Briegel and Raussendorf [18], Fussy et al. [11], Lent et al. [77], 
Lloyd [81] or Meyer [88]). The two main attractions of quantum cellular automata are (a) 
massively parallel structure, and (b) the possibility of a phase transition. We have already 
discussed massively parallel structure of quantum cellular automata in Section 3.4.3; here 
we focus our attention on phase transitions. A necessary condition for the existence of 
a phase transition in a cellular automaton is nonergodicity. Richter and Werner [110] 
gave an ergodicity criterion for quantum cellular automata, formulated in terms of the 
completely positve map that describes, in the Heisenberg picture, the transition rule of 
the automaton. Assuming that each cell (site) of the automaton is under the influence 
of some strictly contractive error channel T, an interesting problem would be to devise 
such a transition rule that the automaton would be nonergodic. In this respect we should 
mention that, even if T is a strictly contractive channel, it is not at all obvious whether 
T ®T is strictly contractive as well: it has a unique fixed point among the product density 
operators, but there may also be another fixed point oi T ®T that is not a product density 
operator. 
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The physics of relaxation processes is often understood, at least on a heuristic level, 
through the consideration of the balance of energy and entropy, as determined by the 
temperature. There is a thermodynamic function that relates energy, entropy, and tem- 
perature, namely the Helmholtz free energy [20, p. 98], 

F:=E- {1/P)S, 

where E is the energy, S is the entropy, and f3 is the inverse temperature. The second law 
of thermodynamics [36, p. 17] states that no energy-conserving process can decrease the 
entropy. Another way to state this is to say that, among all the configurations of the system 
that have the same energy, the ones with the largest entropy are "thermodynamically 
favorable" [11, pp. 22-24], by which we mean that the corresponding configurations have 
very large probabilities. We can also deal with processes that do not conserve energy, in 
which case we are interested in the incremental free energy, AF = AE — {1/I3)AS. We 
can consider a particular configuration stable if any local modification of this configuration 
results in AF > 0, i.e., the energy cost of the modification more than compensates for 
the entropy gain. On the other hand, if AF < 0, then the energy cost cannot offset the 
entropy gain, and the corresponding configuration is unstable. 

In this chapter we offer an interpretation of the relaxation dynamics of noisy quantum 
computers in terms of the entropy-energy balance. 

4.1 Definition and properties of entropy 

We give a very brief overview of the concept of entropy. Most of the results are just stated 
without proofs. The reader is encouraged to consult the book by Gray [53] for the rigorous 
treatment of entropy in the context of classical information theory; an excellent survey of 
Wehrl [141] is devoted to the concept of entropy in statistical physics. For an abstract 
treatment of entropy in the context of operator algebras, we recommend the book by Ohya 
and Petz [93]. 
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In statistical pfiysics, tlie entropy of a system that can exist in possible configurations 
is given, up to a multiplicative constant, by Boltzmann's formula 

S := InN. 

This assumes, however, that all configurations of the system are equiprobable. When 
this is not the case, i.e., when the ith configuration occurs with probability Wi, the entropy 
is defined as 

N 

S := - ^Wilnwi. (4.1) 

i=l 

Given the probability distribution w = {wi}, we will denote the corresponding entropy 
(4.1) by S{w) or, when we want to exhibit the probabilities explicitly, by S{{wi}). The 
entropy S{w) is referred to as the Shannon entropy of w. It can be shown that, for any 
probability distribution w on an A^-element set, < S{w) < In A^, where the lower bound is 
achieved if and only if Wi = 6ik for some k E {1, . . . , N}, and the upper bound is achieved 
if and only if w is the uniform distribution, Wi = 1/N for all i. In view of this, it is 
natural to regard the entropy as a measure of "randomness" of a probability distribution. 
A crucial property of the entropy is concavity: given any number A G [0,1], we have 
S{Xw + (1 — A)w') > XS{w) + (1 — X)S{w'), where the convex combination Xw + (1 — X)w' 
of the probability distributions w = {wi} and w' = {w,'} is the probability distribution 

{Xwi + (1 - A)^:}. 

Now consider a classical system with the A^-element configuration space ^ and the 
corresponding algebra of observables C{,5t^). Then the one-to-one correspondence between 
the states over C{JK) and the probability measures on ^ allows us to define the entropy 
of the state u as the entropy of the corresponding probability distribution. We see that 
the only states with zero entropy are the pure states; the mixed states all have strictly 
positive entropy. The concavity of the entropy then means that mixing leads to an increase 
in entropy. Because the normalized counting measure on ^ corresponds to the unique 
state that maximizes the entropy, we will refer to this state as maximally mixed. 

Next we turn to the case of a quantum system, the corresponding algebra of observables 
being the algebra E{M') of bounded operators on the Hilbert space associated with 
the system. We assume for simplicity that ^ has finite dimension A^. Then there is a 
one-to-one correspondence between the states over E{M') and the density operators in 
B{J^). Given the density matrix p, we define its von Neumann entropy as 

S{p) := -tiplnp. (4.2) 

The eigenvalues Aj of p form a probability distribution on an A^-element set, and it is clear 
from the definition (4.2) that the von Neumann entropy of p is precisely the Shannon 
entropy of this probability distribution. This also implies that S{p) = if and only if 
p is a pure state, and that the unique state that maximizes S is the maximally mixed 
state 1/N. The von Neumann entropy enjoys a concavity property similar to that of the 
Shannon entropy. 

The von Neumann entropy is a continuous functional on the state space S{J^) when 
the latter is given the topology induced by the trace norm. More precisely, we have the 
following lemma [93, p. 22]. 
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Lemma 4.1.1 (Fannes) Let p,a be two density operators on an N -dimensional Hilbert 
space, and suppose that \\p — a\\-^ < 1/3. Then 

\Sip) - Sia)\ <\nN-\\p-a\\^-v{\\p- (4.3) 

where rjit) := tint. 

Now suppose that we are given a channel T with the following properties: (1) the 
T-invariant state pt is unique, and (2) S(T{p)) > S{p) unless p = px- Then the Fannes 
inequality (4.3) implies that the von Neumann entropy is a strict Liapunov function for 
T. Theorem 3.1.4 can then be used to establish the trace-norm convergence of the orbit 
{T"(p)} to Pt for any initial state p. 

The entropy is an extensitve property (it scales with system size). That is, if we 
consider two systems with the Hilbert spaces and J^, then for any p E S (J^) and any 
a G iS(J^) we have 

S{p0 a) = S{p) + S{a). 

In fact, among all the density operators p G S{Jif ® with the same restrictions 
Pj^ := tr^-^p and := tr^p, the product state p^ ® pje has the largest entropy. This 
property is referred to as the subadditivity of the entropy. There is also a property referred 
to as the strong subadditivity, which consists in the following. Let Jifi, J^, be Hilbert 
spaces; we will use trjj(-) to denote the partial trace over J^i ® J^j- Given a density 
operator p G iS( ® <S> ^), define the partial traces pi := tr23p, Pi2 := trap, and 
so on. Then 

S{p) + S{p2)<S{pu)+S{p23). 

The proof of the strong subadditivity, which was first obtained by Lieb and Ruskai [79], 
is far from transparent, in stark contrast to the fairly straightforward proof of the corre- 
sponding property of the Shannon entropy. 

A useful quantity derived from entropy is the so-called relative entropy. The classical 
definition, for a pair w,w' of probability distributions, is given by 

S(w\w') := Wi In 

i ^^ 

It is easy to show that S{w\w') > with equality if and only if w = w' . Sometimes 
the relative entropy is referred to as the Kullback-Leibler distance, but this is a misnomer 
because the relative entropy is not symmetric and does not satisfy the triangle inequality. 
A more appropriate term is the Kullback-Leibler divergence. The quantum relative entropy 
is defined, for two density operators p and o", as 

S{p\(r) := tr (plnp — plno"). 

The quantum relative entropy has the same positivity property as the corresponding clas- 
sical quantity, namely S{p\a) > with equality if and only if p = o". This can be proved 
using the following lemma [131], which also gives a handy lower bound on >S'(-|-). 
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Lemma 4.1.2 (Streater) Let p and a be two density operators. Then 

1 
2 



Sip\a)>l\\p-a\\l (4.4) 



where \\-\\2 is the Hilhert- Schmidt norm. 

Proof: Consider the function 77(0;), defined as above, on the interval / = [0, 1]. For any 
pair X, y G / we have, by Taylor's theorem with the Lagrange remainder, 

7]{x) = r]{y) + (x - y)r]'{y) + ^{x - y)^r]"{t) 

for some t E I. Now ri"{t) = 1/t > 1 for t E I, which leads to the estimate 

r]{x) - 7]{y) - (x - y)r]'{y) - ]^{x-yf > 0. (4.5) 

Let Qi and </)j be the eigenvalues and the eigenvectors of p, and let hi and ifji denote the 
same objects for a. Define gij := (</)j|?/;j), so that \9ij\^ = 1- Then 

{ct>Mp) - Vicr) - (p - aWia) - (p - afm) 

= \9ijf - V{bi) - {ai - bi)r]'{bi) - (a^ - bif/2]. 



Summing over i and using the estimate (4.5), we get 

tr(plnp-pln(T) > ^tr {p - af = ^ \\p-a\\l, 
and the lemma is proved. 



In a limited sense, the relative entropy S{p\a) can be thought of as a measure of close- 
ness between p and a [we say "limited" because 5'(-|-), just like its classical counterpart, 
fails to satisfy the triangle inequality]. In this regard we mention the result of Lindblad 
[81] that, for any channel T : S{J^) — > iS(J^) and for any pair p, a G S{Jif), we have 
SiTip)\Tia)) < Sip\a). 

4.2 The Gibbs variational principle and thermody- 
namic stability 

In Section 3.1 we have discussed the zeroth law of thermodynamics, which essentially says 
that any macroscopic system will generally be found in the state of equilibrium, charac- 
terized by a few macroscopic parameters. According to the well-known Gibbs variational 
principle in statistical mechanics [121, p. 348], the equilibrium states of a finite quantum 
system with Hamiltonian H at absolute temperature T are precisely those states that 
minimize the free-energy functional 

F^(p):=tr(piJ)-i5(p) (4.6) 
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(in case of an infinite system, one would instead minimize the specific free-energy func- 
tional, i.e., free energy per "particle"). 

It is very easy to see that, when the system in question is finite, the Gibbs state pp [cf. 
Eq. (3.1)] is the unique solution of the variational problem (4.6). Let $(/?) := — (1//3) InZ^, 
where Zp = iie~^^ is the canonical partition function. Then, for any density operator p, 
we have 

F^(p)-$(/3) = tr (pi/) + i [tr(p In p) + In Z^] 

= - i [tr (p In p^) + In Zp] + ^ [tr (p In p) + In Zp] 

= tr (plnp - plnp^) 

= S{p\pf3), 

which implies that Fp{pp) = $(/5). Now all we need to show is that, for any p 7^ p^, 
F^(p) > $(/?), but this follows immediately from Lemma 4.1.2. The uniqueness of the 
solution to (4.6) for finite systems makes them unsuitable for the study of macroscopic 
degeneracy (i.e., when a given macroscopic system has multiple equilibrium states at a 
given temperature) and phase transitions; it is then necessary to pass to the so-called 
thermodynamic limit. 

An alternative characterization of equilibrium states is developed through the notions 
of global and local thermodynamic stability [122]. Global thermodynamic stability is 
equivalent to the Gibbs variational principle: states that are globally thermodynamically 
stable (GTS) are precisely those that minimize the specific free-energy functional. On 
the other hand, a state p is locally thermodynamically stable (LTS) if the specific free 
energy of any state a, obtained by local perturbation of p, is greater than that of p. 
It is known that any GTS state is also LTS, but the converse is not generally true for 
an infinite system [122, pp. 31-33]. We thus obtain a useful device for showing that a 
given state is not GTS: namely, showing that it is not LTS. An argument of this kind is 
referred to as an "entropy-energy argument" [125] since showing that the state p is not 
LTS amounts to showing that it is possible to perturb p locally in such a way that the 
resulting change in specific free energy is negative, owing to the fact that the entropy gain 
due to the perturbation overwhelms the corresponding energy shift. Although the goal of 
an entropy-energy argument is to show that a given infinite- volume state is not GTS, it 
is often possible to consider only the finite-volume scenario to show that the state is not 
LTS. 

We illustrate a typical entropy-energy argument by giving a heuristic description of the 
argument due to Thouless [136] concerning the absence of ordering in a one-dimensional 
Ising system (spin chain) with short-range interactions. The original argument appeared 
in the text by Landau and Lifshitz [74, p. 537]; the version of Thouless is a refinement 
of their reasoning. Let us assume, to the contrary, that ordering exists; that is, all of 
the spins in the chain point in the same direction. Then, if the ordered phase is stable, 
the corresponding state must minimize the free-energy functional. Now suppose that we 
reverse all the spins in a segment of large (but finite) size N. Due to the short range of the 
interactions, the energy cost of inserting this "macroscopic droplet" is bounded above by 
a constant. Now, if we randomly insert this droplet in any one of n contiguous segments. 
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the entropy gain will be on the order of Inn, so the free-energy gain will be bounded from 
above by const — Inn, which will be negative for large n. Hence, by means of a local 
perturbation of the putative ordered phase, we obtain a state of lower free energy, which 
implies that the ordered phase is unstable. 

Arguments of this kind depend crucially on both the dimension of the model and on 
the range of the interactions. For instance, they are inapplicable whenever there exists a 
possibility that energy may overwhelm entropy. Consider, for instance, a dimensional 
Ising model with short-range interactions, where d > 2. Then, upon being presented with 
the ordered phase, we flip all the spins in a hypercube of volume A^, chosen at random out 
of n contiguous hypercubes. Again, this results in the entropy gain of Inn. However, the 
energy cost of flipping the spins in a hypercube of volume A^ will be on the order of its 
surface area, so that the free-energy gain will be on the order of const ■ A^('^~^'/'^ — In n,. 
In this case it very well may happen that the energy shift will offset the entropy change. 

4.3 Entropy-energy arguments and quantum infor- 
mation theory 

Our goal in this chapter is to incorporate entropy-energy arguments into the framework 
of quantum information science. Our starting point will be Streater's adaptation of an 
entropy-energy argument, described in his monograph [132] on nonequilibrium thermody- 
namics. We briefly illustrate his approach in order to set the stage for our own investiga- 
tion. 

Consider a quantum system E whose initial state is given by a density operator p 
and let T be an irreversible discrete-time dynamics constructed by mixing of reversible 
evolutions. Then the von Neumann entropy S will increase monotonically along the orbit 
{T"(p)}. Let H be the bounded energy observable (Hamiltonian) of S with the property 
that its spectral projections are left invariant by T. Then the mean energy is conserved 
along the orbit but, since the entropy keeps on growing, the sequence of iterates T"(p) 
will eventually converge to the mixture of microcanonical states on the eigenspaces (energy 
levels) of H for any choice of the initial state p (assuming that each eigenvalue of H has 
finite geometric multiplicity). In case of an irreversible quantum dynamics T that does 
not conserve energy, the same argument works as well because, as one can easily show, the 
absolute value of the energy shift due to T can be crudely bounded from above by twice 
the operator norm of H (i.e., by twice the largest energy available to S). 

Now we present our twist on Streater's reasoning, as well as the essence of our method. 
While it is clear from the above discussion that, for sufficiently high temperatures, entropy 
will eventually overwhelm energy, we would like to obtain an estimate as to when that 
will happen. In order to do so, we appeal to the Gibbs variational principle. Let us, for 
simplicity, assume that the system E is finite, and is maintained at inverse temperature 
p. As we have already pointed out, the energy shift due to the iterated dynamics T" is 
bounded from above by 2 ||iir||. Suppose, further, that we have the lower bound on the 
entropy gain due to T" in the form AS* > /(n), where / is an invertible function. Then the 
free energy will change by at most 2 \\H\\ — f{n) / (3. This number will, in turn, be negative 
when /(n) > 2(3 \\H\\. If / is an increasing function, then so is the inverse function f~^. 
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As a consequence of this, entropy will exceed energy for all n > f~^(2/3 \\H\\). Thus we see 
that, in order to keep the system stable for a long time, we must either increase the energy 
or lower the temperature (or do both). The exact form of the function / will then allow 
us to appraise the energy-temperature trade-off involved in keeping the system stable. 

The dynamics of a noisy quantum computer can be pictured as the competition between 
the entangling unitary transformations and the localized errors which tend to destroy 
entanglement, thereby increasing entropy [1]. Because a large-scale quantum computer 
is neither homogeneous in space nor homogeneous in time, there is no straightforward 
way to tackle this problem using the formalism of statistical mechanics of spin systems. 
In particular, the usual notion of the thermodynamic limit no longer applies. Recall, 
however, our remark in the preceding section that it is possible to carry out entropy- 
energy arguments without passing to the thermodynamic limit. In our case we can reason 
as follows. A large-scale quantum computer is, for all practical purposes, a macroscopic 
system. However, the only part of this system of any interest to us is comprised by the 
degrees of freedom directly involved in the actual computation; the number of such degrees 
of freedom is ostensibly finite. If we can show that this finite system is not LTS, then the 
entire macroscopic computer cannot be GTS. 

There are two separate aspects of the thermodjTiamics of noisy quantum computers 
— the temporal and the spatial. The temporal aspect refers to the maximum number of 
computational operations that can be carried out before being in any state, other than 
the microcanonical (maximally mixed) state, becomes "thermodynamically unfavorable" 
for the computational degrees of freedom. This is, of course, tied closely to the relaxation 
time. The spatial aspect concerns the size of the computational subsystem as measured 
in qubits — on a heuristic level, we can expect that, as the subsystem gets larger, there 
is more room for "randomness" in the locations of the errors, so that it may be possible 
to show, using a typical entropy-energy argument a la Thouless, that the computer is not 
LTS. We present analyses of these two aspects in Sections 4.4 and 4.5, devoting the rest 
of this section to energy shift estimates. 

We agree at the outset to deal only with the circuit model of quantum computation, in 
which case we adopt the model of noisy quantum computation from Section 3.3. Namely, 
each step of the computation is the application of a unitarily implemented channel (quan- 
tum gate), followed by an invocation of a fixed noisy channel T. We take T to be strictly 
contractive and bistochastic. In fact, as we have shown in Section 3.2.4, if the noise mod- 
eled by T is sufficiently weak (i.e., ||T — id\\^^ < e with e sufficiently small), then there 
exists a depolarizing channel D^j such that \\T — D^^W^^ < e and r] < e/const. 

Let us first consider the temporal aspect, in which case we are interested in the estimate 
of the energy shift due to n successive invocations of the noisy channel T. In the case 
when the Hamiltonian H does not depend on time, the estimate is easy — we have, for 
any density operator p, 

\AE\ = |tr [T''{p)H] - tr {pH)\ < 

If we picture noiseless quantum computation as an evolution governed by the Schrodinger 
equation, then the corresponding Hamiltonian is manifestly time-dependent. Noisy com- 
putation could then be described by a Lindblad master equation [S3], but the Hamiltonian 



f "(if ) - H 



< 2 \\H\ 
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part of the corresponding Liouvilhan would still be time- dependent. We can, however, cir- 
cumvent this issue for the following reason. The goal of "temporal" entropy-energy argu- 
ments is to obtain an estimate of the maximum number nmax of computational steps that 
can be carried out before the entropy gain due to the repeated invocations of T overwhelms 
the energy cost of the computation, which may include any energy resources required to 
perform error correction. Specifically, we are interested in the expression for rimax in terms 
of energy and temperature. Therefore we assume that we operate the computer under 
the maximum energy constraint, i.e., the energy shift can be estimated as AE < -Emax for 
some given -Emax- Recalling the discussion above, we will then have nmax > /"^(/^-Emax), 
where / is the function that figures in the lower bound on the entropy gain due to T". 

Now we turn to the analysis of the spatial aspect. Suppose that we have a quantum 
computer comprised by a large number of qubits. Let T be the channel that models the 
decoherence of a single qubit in the computer. Imagine picking, at random, one out of 
n disjoint /c-qubit sets and applying the channel T '^'^ to the qubits in this set. Suppose 
that the energy shift due to this local perturbation is independent of k and n. Then we 
want to show that if we take k large enough, there will be some finite value of n such that 
the corresponding entropy gain overwhelms the energy shift. How can we show that the 
energy shift can, in fact, be bounded independently of k and n? We reason as follows. 
Given an initial state of qubits, consider a quantum circuit whose size is polynomial in 
A^. Each gate in the circuit acts on at most c qubits, where the number c is independent 
of N. Let ps-i be the state of the computer given by 



where po is the input state, and T is a fixed noisy channel. Suppose that the sth gate 
has been applied, so we have the transformation ps_i UsPs^iU*. Now, when we invoke 
the channel T, the corresponding energy shift will be determined by the Hamiltonian Hg, 
where Us = exp {—iHgT/h) and r is the time it takes to apply the sth gate. Because, 
by hypothesis, each gate acts on at most c qubits, the energy shift can be bounded from 
above by a function of c alone. This assumption can also be justified on the grounds of 
"local reversibility" [OO] (cf. also the "gearbox quantum computer" of DiVincenzo [33]). 

We can, in fact, put this energy shift estimate in a broader context of simulation of 
quantum systems using quantum computers [91, pp. 204-212]. Again, consider a system 
of A^ qubits and the Hamiltonian 



where P is some polynomial, and each local interaction Hk involves at most c qubits, the 
number c being, as before, independent of A^. Then, assuming that we can implement the 
unitary evolution generated by each term Hk using a circuit whose size is polynomial in 
c, we can simulate the unitary evolution generated by H using high-order approximations 
[35] of the Lie- Trotter product formula 




P(N) 



k, 



k=l 
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for any two matrices A, B of the same shape. The main point is that, at each step of the 
simulation, the number of interacting qubits is bounded from above by a function of c 
alone. 

4.4 Entropy-energy balance and the maximum num- 
ber of operations 

In the preceding section we have argued that, as far as the temporal entropy-energy 
arguments are concerned, we may assume that the energy shift due to n invocations 
of the noisy channel T can be bounded from above by some constant -Emax? which can be 
thought of as the energy resources available for the computation. In order to proceed with 
the entropy-energy argument, we need the estimate of the entropy gain due to T". We 
state some preliminaries first. 

Let T : S{M') S{J^) be a channel. Extending the map T to all of B{M') and 
treating the latter as a Hilbert space with the Hilbert-Schmidt inner product {A,B) : = 
tr {A*B), we see that the Heisenberg-picture channel T coincides with the adjoint operator 
T*. Indeed, let {Va} be a Kraus decomposition of T. Then we have, for any A,B ^ B{Jif), 

{A,T{B)) = Y.{A,V^BV:) = Y.tT{A*V^BV:) = Y.tT{V:A*V^B) = (f(A),5), 

a a a 

which shows that T* = T. Furthermore, if T is bistochastic, then the map TT is also a 
bistochastic channel: it is a composition of two completely positive maps, and its Kraus 
decomposition {V^Vg} has the proper normalization, 

T.y:ypiy:ypy 

where we have used the fact that T is bistochastic. In addition, the map TT is self-adjoint 
in the sense that (^A^TT{B)^ = (tT{A),b'^, and hence diagonalizable. Furthermore, TT 
is a positive operator^ because, for any A G B{Jf), 

(^A,fT{A)) = {T{A),T{A)) = tr [T{A)*T{A)] > 0. 

Because the absolute values of the eigenvalues of any completely positive map do not 
exceed unity [135], we conclude that the spectrum of TT is contained in the interval [0, 1] 
of the real line. We will need some additional ergodic and spectral properties for the 
channel T, which we summarize in the following definition. 

^Here we mean operator positivity in the usual Hilbert-space sense, not in the sense that TT maps 
positive operators to positive operators, which it obviously docs. 
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Definition 4.4.1 Let T : B{M') — ^ B{Jif) be a histochastic channel. We say that T is 
ergodic with spectral gap ^ if 1 is the only fixed point of T in B{J^), and the spectrum 
of the channel TT is contained in the set [0, 1 — 7] U {1}. 

Our starting point will be the following entropy gain estimate due to Streater [131]. 

Lemma 4.4.2 (Streater) Let be a Hilbert space of finite dimension d. If T : 

B{,y^) B{J^) is a bistochastic channel which is ergodic and has spectral gap 7, then for 
any p G S{Jif) 

S{T{p))-S{p)>^\\p-l/d\\l (4.7) 

Proof: Given a bistochastic channel T, a theorem of Alberti and Uhlmann [3] says that, 
for any p, there exist unitaries Ua and nonnegative numbers pa with J2a Pq = 1 such that 
T{p) = J2aPaUapU*. Define pa '■= UapU*. By Lemma 4.1.2 we have, for each a, 

1 2 
tr [p„ In /7„ -p„ In T(p)] > - - T(p)||2 . 

Becuase Pa and p are unitarily equivalent, we have S{pa) = S{p) for all a, and thus 

J2Pa^^ [Pa In Pa " PalnT(p)] = S{T{p)) - S{p), 
a 

which yields 

S{T{p)) - Sip) >Wpc.\\Pa- Tip)\\l . (4.8) 
We can rewrite Eq. (4.8) as 

S{Tip))-Sip) > l^p,[(p,,p,)-(p,,T(p))-(T(p),p,) + (T(p),T(p))] 

= l[{p,P)-{np),T{p))] 

= ^ (p, (id -fT)(p)). (4.9) 

Because 1 is a simple eigenvalue of TT, we can write p as a direct sum I/d © (p — I/rf). 
Then Eq. (4.9) becomes 

S{T{p))-S{p) > l(p-I/rf,(id-fT)(p-I/rf)) 
> l\\p-Vd\\l 

where the last inequality follows from the fact that 7 is the smallest nonzero eigenvalue 
of TT. We thus obtain Eq. (4.7), and the lemma is proved. ■ 

Remark: Notice that the theorem of Alberti and Uhlmann cited in the proof above 
does not imply that a channel is bistochastic if and only if it is a convex combination 
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of unitarily implemented channels. While it is obvious that any convex combination of 
unitary conjugations is a bistochastic channel, Landau and Streater [75] showed that 
the converse is not true in general when the dimension of the underlying Hilbert space 
is greater than 2. The Alberti-Uhlmann theorem only says that if T is a bistochastic 
channel, then for each p G S{J^) there exist p-dependent unitaries Ua and nonnegative 
weights Pa such that T(p) = J2aPaUapU*. □ 

Lemma 4.4.2 says that the von Neumann entropy is a strict Liapunov function for any 
bistochastic channel which is ergodic with a spectral gap. Hence Theorem 3.1.4 can be 
applied to show the trace- norm convergence of the orbit {T"(p)} to the maximally mixed 
state 1/d for any p G S{J^)] the rate of convergence is controlled by the spectral gap. 
The next result shows the connection between ergodic bistochastic channels on Ai2 and 
strictly contractive channels. 

Theorem 4.4.3 Let T : M.2 «M2 be a bistochastic channel. IfT is strictly contractive, 
then it is ergodic with spectral gap 7 = 1 — k'^ , where k is the contractivity modulus. 
Conversely, if T is ergodic with spectral gap ■j, then it is strictly contractive with k = 



Proof: First we need a lemma. 

Lemma 4.4.4 // T : B{J^) B{Jif) is a bistochastic channel, then so is the dual map 
T : B{J^) B{J€') in the sense that T is a completely positive trace-preserving unital 
map. 

Proof: Let {Va} be a Kraus decomposition of T, so that, for any X G B{J^), we have 
T(X) = J2a VaXV*. Then we have ^a^a = I because T is trace-preserving, and also 
J2a ^aV* = 1 because T is bistochastic. Now, for any A G B{J^), we have 



Hence T is a completely positive trace-preserving unital map, i.e., a bistochastic channel. 
■ 

Now let T : — ^ -^2 be a strictly contractive bistochastic channel, and let V be its 
interaction algebra (cf. Section 3.4.2). Theorem 3.4.4 then says that V = CI, where 
V' is the commutant of V. By Lemma 4.4.4, the Heisenberg-picture channel T is trace- 
preserving, and leaves invariant the maximally mixed state 1/2, which is an invertible 
matrix. Then, according to the Fannes-Nachtergaele- Werner theorem (cf. Refs. [15] and 
[40] and the remark after Theorem 3.4.4), the set of operators in B{J^) left invariant by 
T is precisely the commutant V. Thus the only operators in Ai2 that are left invariant 
by T are the complex multiples of the identity matrix, i.e., T is ergodic. (In fact, the 
same argument can be used to show that any bistochastic strictly contractive channel is 
ergodic.) 
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Now recall from Section 3.2.3 that if T is a bistochastic channel on then there 
exists a real 3x3 matrix T such that, for any density operator 



p = -(I + w ■ O") 



(4.10) 



we will have 



Tip) 



I + (Tw) ■ a 



Furthermore, if T is strictly contractive, then the contractivity modulus k is precisely the 



operator norm (the largest singular value) 
action on the density operator (4.10) is given by 

1 



of T. Now consider the channel TT, whose 



(TT)(p) 



I + (T ' Tw) ■ a 



Since T is strictly contractive, so is TT; this follows from the fact that, for any two density 
operators p,p', we have 

|(TT)(p-p')||,<l|T(p-p')lli<^llp-p'lli- 

Then the discussion above applies, and TT is ergodic. Furthermore, since TT is 

2 



T'T 



. But 



T'T 



k \ which 



self-adjoint, its second largest eigenvalue equals 

implies that 1 — 7 = fc^. Thus we have shown that if T is a bistochastic strictly contractive 
channel, then it is ergodic with the spectral gap 7 = 1 — A;^. We skip the proof of the 
converse statement, because it is quite similar to this argument. ■ 

As far as the entropy gain estimate goes. Theorem 4.4.3 has the following useful corollary. 

Corollary 4.4.5 Let ,3^ ~ (C^) ®^ , and consider the channel T := R^^, where R : 
7W2 is a strictly contractive bistochastic channel. Then, for any p G iS(Jf ) and 

for any positive n, we have 



S{T-{p))-S{p)> 



where k is the contractivity modulus of R. 



1 



„2n 



p-I/2 



N 



(4.11) 



Proof: Because R is bistochastic and strictly contractive, so is T (cf. Section 3.2.3). 
Hence, T is ergodic. Furthermore, the contractivity modulus of T equals that of R. 
Therefore, in order to prove Eq. (4.11), all we need to do is to estimate the spectral gap 
of T" and then apply Lemma 4.4.2. 

If k is the contractivity modulus of i?, then the contractivity modulus of i?" (and hence 
of T") is at most /c". Now if 1 — 7 is the second largest eigenvalue of R^R^, then Theorem 
4.4.3 implies that 7 > 1 - A;^". But 



T"T" = (i?^^)"(i? 



n. Tjn \ A'' 



(i?"/2 
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so the second largest eigenvalue of T"T" equals that of K^K^. Thus the spectral gap of 
T" is at least 1 — /c^'", and the corollary is proved. ■ 

Now we are ready to proceed with our entropy-energy argument. Suppose that we have 
a quantum computer operating on N qubits at the inverse temperature (3. We assume 
that the input to the computer is given by an effective pure state 

p=(l-e)I/2^ + e|^)(^|. (4.12) 

Let us adopt the typical model of local stochastic noise [1], meaning that the noisy channel 
T has the form R'^'^ , where i? is a bistochastic channel on A42- We will further assume 
that R is strictly contractive. Given the "energy resources," or maximum allowed energy 
cost, -Emax) we are interested in the maximum number rimax of computational steps that 
can be carried out before the balance of energy vs. entropy tips in favor of the latter. We 
claim that if P, -Emax, e, and TV are such that 

2/1 _ o-A^^ 

/9i?max < (4.13) 

then 

log{l-2/5i?^,./[e2(l-2-^)]} 

'^max = ^ 777— r ^, 4.14 

2 log k 

where k is the contractivity modulus of R. Indeed, we have 

p - 2-^l|[ = e^tr - 2-^l)' = 6^(1 - 2"^). 

Using this result and Corollary 4.4.5, we can bound the entropy gain due to from below 
as follows: 

5(np,)-s,p)>i!ii^qii^. 

The corresponding change in the free energy is then 

2/5AF < 2PE^,, - e\l - P")(l - 2"^). 

The right-hand side of this expression will be negative when n > n^^^y^ with rimax given 
by Eq. (4.14). When the number N of qubits in the computer is very large, and when e 
does not depend on the term can be neglected, so Eqs. (4.13) and (4.14) become 
respectively 

and 

_ l0g(l-2/jE^axA^) 

2 log A; 

The condition (4.13) restricts the range of applicability of our entropy-energy argu- 
ment to low-energy quantum computers that are operated at high temperatures. We can, 
however, dispense with Eq. (4.13) altogether when the noisy channel T has the form 

T = (1 -5)id + 5i?®^ 
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for some small positive 6, where i? is a strictly contractive bistochastic channel on 
with the contractivity modulus k. Then we have 



rpn 



E 

1=0 



J, 



Let the input state be given by Eq. (4.12) and suppose, as before, that is sufficiently 
large and that e does not depend on N . Then, using the concavity of the von Neumann 
entropy as well as Corollary 4.4.5, we obtain the bound 



1=0 



'l-5 + 5k 



2\n 



= -n6{l-e)+o{6), 

where o{5) stands, as usual, for terms that go to zero faster than 5 as 5 ^ 0. By the 
assumed smallness of 6, we may neglect these terms. Then the free-energy increment can 
be calculated from 

2/3AF<2/3E^^-e^n5{l-k^), 

so that we obtain 

2/3-E'max 

This formula shows clearly that, in order to keep the computer stable, we must either lower 
the temperature or raise the energy (or, perhaps, do both). The good news is that, when 
the noise is weak, the required thermodynamic resources are polynomial in the number of 
operations. 



4.5 Thermodynamic stability of large-scale quantum 
computers 

In Section 4.4 we used an entropy-energy argument to estimate the maximum number of 
operations that can be carried out on a noisy quantum computer before the cumulative 
entropy gain due to decoherence overwhelms the energy cost of the computation, including 
error correction. Here we consider the spatial aspect of decoherence in quantum computers 
and show, using an entropy-energy argument, that there exists an upper bound on the 
number of qubits that can be accommodated by circuit-based quantum computers. 

What we present here is an adaptation of the classical entropy-energy argument (cf. 
Simon and Sokal [125]) that shows the absence of ordering in a one-dimensional Ising ferro- 
magnet with short-range interactions. The main ingredient of the Simon-Sokal argument 
is the following entropy gain estimate. Consider a finite portion of a chain of classical 
Ising spins divided into n disjoint segments, each containing k spins. Now suppose that 
we choose, at random, a /c-spin segment and fiip all of the spins in it. The resulting en- 
tropy gain is easily seen to be on the order of Inn. Simon and Sokal then give a lower 
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bound on the entropy gain when the segments are "almost disjoint," in the sense that 
the corresponding configurations can be associated with probabihty measures with "min- 
imally overlapping" supports. The following lemma and its proof are a straightforward 
adaptation of a similar result due to Simon and Sokal [125]. 

Lemma 4.5.1 fentropy of statistical mixtures). Let pi,i = l,...,n, ben mutually 
commuting density operators. Suppose that there exists a constant k > such that, for 
each i, we have 

tTY.pjPi^^- (4-15) 

Let p = SiLi Pi- Then 

n 

S{p) > n-^ SiPi) + In n - 2k^/^. (4.16) 

i=l 

Proof: Let {q^^ be the eigenvalues of pi. We have then 

S{p) - ^"^^^(pj) - Inn = -n~^^trpj In^Pj - Inpi 

i i \ j / 

i a (Ja i a \ Qa / 

> -2n''j:^nj:({q^^y + j:q(j\^^ (4.17) 

i a \ j-Li J 

> -2n-^El4l + EfE^M^^l I (4-18) 

i y a / J 

> -2n"^EEfE?i^'^?i^^) (4-19) 

= -2n"i^tr (j2pjPi] ' (4-20) 

where (4.17) is a consequence of Jensen's inequality [57] and the convexity of the function 
X I— s> — Inx, (4.18) uses (a + 6)^/^ < a^/^ + fe-"^/^, and (4.19) uses ln(x + 1) < x. For i fixed, 
we have, for the trace in (4.20), 

tr fEP.P^l <ftrEP.P^) ' (4-21) 

which follows from the concavity of the function x t— > ^/x and the self-adjointness of 
the operator J^j^iPjPi (the latter is a consequence of the fact that pi are mutually 
commuting). The right-hand side of (4.21) can now be bounded from above by k^/^, and 
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the lemma is proved. ■ 

Remark: The requirement that the density operators pi be "almost disjoint" is 
rigorously borne out by Eq. (4.15). □ 

Consider now a quantum computer operating on qubits. In Section 4.3 we have 
already noted that, during each computational step, the number of interacting qubits is 
no greater than some fixed constant c that depends on the particular algorithm, but not 
on N . Hence it follows from the considerations of local reversibility [99] that the energy 
shift due to any noisy channel applied right after the computational step can be bounded 
above by a function of c alone. Another important consequence of local reversibility 
is the observation that we can partition the set Q := {1,...,A^} into disjoint sets C/, 
/ G such that, at some stage of the computation, the overall state of the 

computer will be a separable state of the form 

L 

P = <^PU (4.22) 

1=1 

where pi is a state of the qubits indexed by elements of Ci. This observation can be 
justified as follows. An input state for a typical quantum network is the pure state 



'"^-1 /I 1 \ 



which is manifestly separable. The quantum network responsible for the computation 
consists of one- and two-qubit gates, which explains the formation of disjoint clusters of 
qubits. Furthermore, we can picture the noisy computation as a competition between the 
entangling gates, which tend to form clusters of qubits, and the errors, which tend to 
detach qubits from clusters [1]. We assume, for simphcity, that the clusters all have the 
same size so that A^ = Ld. 

Now suppose that we are given two positive integers, k and ra, so that L > nk. Thus our 
computer is operating on at least knd qubits. Partition the set {1, . . . , nk} C {1, . . . , L} 
into n disjoint /c-element subsets Si,i = 1, . . . ,n. For some rj G (0,1), let Dj') denote the 
depolarizing channel 

D^{A) := (1 - r])A + r/2-'^(tr A)l 
acting on the qubits in the cluster Q. For each i, let Tj be the channel that acts as 
[DIfl'j on the qubits in U«eSi and is the identity channel on the rest of the qubits. 
For the state p given by Eq. (4.22), define the density operators Pi ■= Ti{p),l < i < n. 
Then we have [pi,Pj] = for all We also note the following elementary estimate: 



^tr(p,p,)=^tr [T,(p)T,(p)]<(n 



Given n, d, and t], we can always find such a value of k that 

„ X 2fc ]^ 

V + 7h] < 



2'^ 



n 
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so that the condition (4.15) of Lemma 4.5.1 is satisfied by the p^'s with k, = 1. Hence the 
entropy gain due to the channel T := J27=i can be bounded from below as follows: 

1 " 

S{T{p))-S{p) > -J2S{p,)-S{p) + \nn-2>\nn-2, 

where the last inequality follows from extensivity of the von Neumann entropy and from 
the fact that we can write 

L n 




P = ^Pl = ^ ^Pl 

1=1 i=i \ieSi 

The corresponding free-energy increment can then be bounded by 

/3AF < (3E{c) - In n + 2, (4.23) 

where E{c) is the c-dependent upper bound on the energy shift. We can certainly pick n 
so large that the right-hand side of Eq. (4.23) is negative. With the appropriate choice 
of fc, we see that, if the computer operates on at least nkd qubits, then it is possible to 
"depolarize" a randomly chosen set of qubit clusters in such a way that the resulting 
entropy gain will overwhelm the corresponding energy shift. 



Remark: In order for our entropy-energy argument to work, it is crucial that the 
energy shift be bounded independently of n and k (the latter actually depends on 
n). As we have argued above, this bound can be justified for circuit-based quantum 
computation. We also draw the reader's attention to the following caveat. The validity 
of the entropy-energy argument just presented rests chiefiy on the assumption that, at 
some stage in the computation, the state of the computer can be written in the form 
(4.22). This is certainly true for circuit-based computers, either because the initial state 
is completely separable, or because an initially entangled state is rendered separable by 
noise. □ 



Notice that up to now we have left unspecified the value of the depolarization rate rj 
without sacrificing much of our argument. We can, however, pick rj in such a way that 
the state of the computer will be separable with high probability. This follows from the 
work of Aharonov [1], who showed that when the the computer is realized as a circuit in 
d + 1 dimensions, there exists a critical value rfc G [1/3, 1/2^/'^] such that, for all rj > rjc, 
the state of the computer will eventually end up in the form (4.22) with a large value of 
L. Our entropy-energy argument can therefore be used to elucidate the thermodynamical 
underpinnings of the process by which noisy quantum computers tend towards essentially 
classical behavior. 



4.6 Putting it all in perspective 

The results reported in this chapter help shed some light on the thermodynamics of noisy 
quantum computation. We have seen, in particular, that, when the noise affecting the sys- 
tem is modeled by a bistochastic strictly contractive channel, there is an intimate connec- 
tion between the contraction rate and the rate of entropy production. This lends further 
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support to our claim that strictly contractive channels serve as a physically reasonable 
model of relaxation processes in noisy quantum memories and computers. 

We also showed that there is an upper bound on the number of qubits that can be 
accommodated in a circuit-based quantum computer. The existence of this bound was 
shown under two crucial assumptions: (a) that the energy shift due to a single step of 
the noisy dynamics does not depend on the number of qubits, and (b) that, at some 
point, the state of the computer is separable. As we have emphasized, both of these 
assumptions are justified for the case of quantum circuits. However, this limitation on the 
size of the computer is significant only at high temperatures, as can be easily seen from 
Eq. (4.23). On the other hand, Ozawa showed recently [91] that conservation laws impose 
a lower bound on the size of quantum computers. Unlike the competing upper bound 
implied by our entropy-energy argument, Ozawa's bound is independent of temperature, 
which suggests that operating quantum computers at low temperature may go a long way 
towards curtailing the effects of decoherence. 

Another approach to stabilization of quantum memories and computers would call for 
replacing circuit-based computation with the following procedure [18, 146]: upon prepar- 
ing a multiparticle entangled state, a suitable set of observables is measured on it, and 
the measurement results are processed on a classical computer. While any computation 
performed using this technique is manifestly irreversible, it has the advantage of being less 
susceptible to the effects of noise by virtue of cutting down the computation time. 



CHAPTER 5 



Information storage in quantum spin 
systems 



In Chapter 4 we have considered the situation when the effect of noise is such that the 
entropy produced exceeds the resulting energy shift, at which point it becomes thermo- 
dynamically unfavorable for the computer to be in any state other than the maximally 
mixed (microcanonical) state, or a mixture of such states on the energy eigenspaces. In 
this chapter we briefly comment on the possibility of reliable storage of information in 
quantum spin systems in which there eixsts a possibility of a phase transition. In this case 
it can be shown, using the so-called Peierls argument [41, 54, 98], that the energy shift 
does, in fact, overwhelm the entropy gain. 

Our goal here is to reinterpret the results of rigorous perturbation theory for quantum 
spin systems in the context of quantum information processing. We hope that these 
preliminary findings might spur further research into this topic both in the quantum 
information community and in the statistical mechanics community. The contents of this 
chapter follow Ref. [105] essentially verbatim. 

5.1 Toric codes and error correction on the physical 
level 

Error correction is a key ingredient in any good recipe for quantum information processing. 
Many ingenious schemes have been invented to that effect. A particularly interesting 
approach has been suggested by Kitaev and colleagues in a series of beautiful papers [17, 
32, 66], namely the possibility of implementing quantum error correction on the physical 
level. 

Consider a. k x k square lattice A on the torus Z^/Z. Associate a qubit with each edge 
of A, for a total of = qubits. We can identify two kinds of geometric objects on A, 
vertices and faces; Fig. 5.1 shows a portion of the lattice together with a vertex v, a face 
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Figure 5.1: Square lattice on a torus. 

F, and the edges incident with v and F . It is easy to see that there are k"^ vertices and /c^ 
faces. 

Given a vertex we denote by Tj{v) the set of all edges of A incident with v. Similarly, 
given a face F, we let dF denote the boundary of F. For any contains exactly 

four edges; the same can be said of dF for any F . Now define the verification operators 

:= (g) al, Bp := (g) a^, 

where denotes the Pauli matrix ai acting on the Hilbert space of the qubit associated 
to the edge e. It is easy to see that all of these n operators commute with each other, and 
are self-adjoint with eigenvalues ±1. 

Let ^ be the Hilbert space of the n qubits on the lattice and consider the protected 
suhspace 

J(f ■={i)e ^\A^^ = ij, Bp^ = ^ Vw, F}. 

There are two relations connecting the operators Ay and Bp, namely nj;^i> = I.if and 
YIf^f = liT- Hence there are m = n — 2 independent verification operators. Using the 
theory of the so-called stabilizer codes [51], it can be shown that the dimension of the 
protected subspace is equal to 2"""* = 4. 

In Ref. [67], Kitaev proposed the following approach to quantum error correction. He 
considered the Hamiltonian 

Ha:=-J2Av-J2Bf, (5.1) 

V F 

where the summations run over all the vertices and faces of A. Note that this Hamiltonian 
is formed by 4-spin interactions, namely the verification operators. The ground state of 
the Hamiltonian (5.1) is fourfold degenerate, and the corresponding eigenspace is precisely 
the protected subspace J^. We can therefore store a state of 2 qubits as a vector in J(f . 

Addition of a small local perturbation given by the sum of certain single-spin terms 
and 2-spin interactions (cf. Ref. [67] for details) modifies the Hamiltonian H\ to H\{e), 
where e is the perturbation strength. The effect of the perturbation is to introduce an 



84 



Chapter 5: Information storage in quantum spin systems 



energy splitting between the degenerate ground-state levels of the unperturbed Hamilto- 
nian. Kitaev then argues that there exists a constant eo such that, at low temperatures 
and for all |e| < eo, the energy splitting for sufficiently large A is given by exp {—ck) for 
some positive c. In other words, in the thermodynamic limit A f Z^, the ground state is 
still fourfold degenerate, and any sufficiently weak perturbation is "washed out" by the 
system itself. 

The four-body interactions comprising the Hamiltonian (5.1) were originally considered 
by Kitaev in Ref. [66] as a basis for the construction of a family 2) stabilizer codes, 
which he termed "toric codes." The remarkable feature of toric codes is the fact that, 
despite their apparent nonoptimality (in the sense of Calderbank and Shor [19]), they 
require only local operations for their implementation and can correct any number of errors 
(provided that the lattice is large enough). The bulk of Kitaev's analysis of toric codes was 
concerned with their properties as "conventional" quantum error-correcting codes [68] that 
require active intervention through frequent measurements and other external processing. 
The issue of constructing "self-correcting" quantum spin systems on the basis of toric 
codes has been taken up again only very recently by Dennis et al. [-32]. Their approach, 
however, is centered around the topological features of toric codes and delves deep into 
such subjects as nonabelian gauge theory [9, 32]. 

On the other hand, the very idea of physical error correction is so tantalizing, both 
practically and conceptually, that one cannot help but wonder: how generic are phenomena 
of this kind? In this chapter we show that a few results in statistical mechanics of quantum 
spin systems point towards the conclusion that physical error correction is fairly common 
in such systems, under quite reasonable conditions. 

5.2 Laying out the ingredients 

First of all, let us agree on the ingredients necessary for the analysis of a self-correcting 
quantum spin system. Let A C 1^ , where z/ > 2, be a finite lattice. Let be the 
(25* + l)-dimensional Hilbert space of a single particle of spin S. Spins are situated on 
the lattice sites / G A (in Kitaev's construction, spins were located on the lattice bonds). 
In order to retain a superficial analogy with stabilizer codes, we will assume that the 
unperturbed Hamiltonian is classical, i.e., the interactions comprising it generate an 
abelian subalgebra of the algebra B^M'^) of all linear operators on J^i^ := {S);gA J^i, where 
J^i is an isomorphic copy of J^. That is, 

McA 

where each $m is a self-adjoint operator on J^m '■= ®1(^m and [$A/,$Af] = 0. We 
assume periodic boundary conditions [that is, the lattice A is drawn on the torus (Z/ZcZ)'^, 
where k is the lattice size]. We let {la)} be the orthonormal basis of in which H\ is 
diagonal; the basis vectors are labelled by classical spin configurations, a = {cr;};gA with 
ai G { — S, —S + 1, . . . , S* — 1, S}. We also assume that the smallest eigenvalue of i^A is 
equal to zero, and that its geometric multiplicity is m > 2. We denote the corresponding 
eigenspace by 
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The effect of errors is modeled by introducing an off-diagonal perturbation term to the 
Hamiltonian: 

HA{e) ■.= HA + eP, 

where e is a positive constant and P is a self-adjoint operator whose exact form is, for the 
moment, left unspecified. Addition of the eP term will perturb the eigenvalues of Ha, re- 
sulting in energy splitting between orthogonal ground states of the original (unperturbed) 
Hamiltonian. Consequently, we define 

AEA(e) := max (g_\HA(e)\a). 

Thus the basic idea behind a self-correcting quantum spin system boils down to the 
following. Information is stored in the ground-state eigenspace of the unperturbed Hamil- 
tonian Ha- The multiplicity m is, obviously, dictated by the desired storage capacity: 
when m = 2^, our "ground-state memory cell" will hold k qubits. Errors will cause some 
of the information to leak out into excited states. In order for error correction to take place, 
the system should be able to recover its ground state from sufficiently weak perturbations 
at sufficiently low temperatures (the fact that we have to work with low temperatures is 
clear since we are dealing with the ground state). That is, we hope that there exists a 
threshold value eo such that 

lim AEA(e) = 0, e < eo 

However, this condition is necessary but not sufficient for error correction. It may happen 
that the m-fold degeneracy of the ground state does not survive in the thermodynamic 
limit [this possibility is borne out by the off-diagonal matrix elements {Qi\HA{€)g^) , where 
1^) ) I ol) £ '^a] ■ Therefore we require that the ground state of the perturbed Hamiltonian 
remain m-fold degenerate for all e < eo in the thermodynamic limit. In the next section 
we elaborate further on these requirements for self-correction and show that they are quite 
easy to fulfill in a wide variety of quantum spin systems. 



(5.2) 



5.3 Putting it together 

The main question is: which restrictions ensue on the unperturbed Hamiltonian Ha and 
on the perturbation P? It turns out that this question can be answered using the same 
methods that are employed for constructing low-temperature phase diagrams for classical 
spin systems with quantum perturbations [13, 29, 64]. Thus the Hamiltonian Ha can be 
comprised by n-spin interactions (for fixed n) that satisfy the Peierls condition [127]: the 
energy cost of a local perturbation u' of a translationally invariant ground state u is on 
the order of the surface area of the region that encloses the part of the lattice on which 
00 and u' differ. Additionally, the unperturbed Hamiltonian is assumed to have a spectral 
gap g > (i.e., its first nonzero eigenvalue > g). Admissible perturbations are formed by 
sums of translates of an arbitrary self-adjoint operator Po, whose support (the set of sites 
on which the action of Po is nontrivial) is finite and encloses the origin of the lattice. Thus 

p = T.Pi, 
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where Pi = "JiPq with 7^ being the automorphism induced by the translation of the lattice 
A that maps the origin to the site / and respects the periodic boundary conditions. 
Also, both the unperturbed and the perturbed Hamiltonians are assumed to be invariant 
under unitary transformations induced by a symmetry group acting transitively on the set 

{\a)}nj^l 

Assuming these conditions are satisfied, we invoke a theorem of Kennedy and Tasaki 
[64], which says that there exists a constant eg such that, for all e < eo, the perturbed spin 
system has m translationally invariant ground states in the thermodynamic limit. Fur- 
thermore, if the m translationally invariant ground states of the unperturbed Hamiltonian 
are invariant under some additional symmetries, these invariance properties carry over to 
the ground states of the perturbed system. The threshold value eo of the parameter e is 
determined by developing a low-temperature expansion [50, 64] of the perturbed partition 
function using a modified Lie- Trotter product formula. 



Af^oo 



N ^ 



The trace is expanded in the basis {|o:)}, thus allowing for combinatorial analysis on a 
"space-time" grid, where the space axis is labelled by the lattice sites / and the time axis is 
labelled by the values 0, {1 / N) (3 , {2/ N) (3 , . . . , (3 . The perturbation theory is controlled by 
a suitable coarse-graining of the time axis, which then allows to determine the threshold 
value eo that will render the contributions of the perturbation terms Pi sufficiently small. 
In fact, the translation-invariance requirements can be lifted, with the perturbation theory 
still going through [64]. A similar space-time analysis of the error rate has been described 
heuristically by Dennis et al. [32]. 

Another important issue is the following: while the infinite- volume ground state of the 
perturbed system may retain the degeneracy of the original (unperturbed) ground state, 
the degeneracy may be lost when the lattice has finite size. This phenomenon, referred 
to as obscured symmetry breaking [70], is characterized by the fact that the low- lying 
eigenstates of the finite-system Hamiltonian converge to additional ground states in the 
thermodynamic limit. In this case we will have, for any finite A, 

A^A(e) > 0. 

It is therefore important to obtain an estimate of the convergence rate in (5.2); namely, 
given some 5 > 0, find Nq such that 

AEA(e) < 5, |A| > No. 

Knowing the convergence rate allows us to appraise the resources needed to implement 
error correction with the desired accuracy 5. In this respect, an estimate of the form 

AEA(e) = e-'=l^l, e<eo, (5.3) 

where the constant c depends on e, would be ideal — an exponential gain in error- correct ion 
accuracy could then be achieved with polynomial resources. This exponential convergence 
rate is, in fact, one of the most attractive features of Kitaev's construction in Ref. [67]. 
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On the other hand, the rate at which AE/<^{e) converges to zero is determined by the 
unperturbed Hamiltonian H\, the perturbation P, and the perturbation strength e. It 
is therefore important to know what we can expect in a generic setting. Obviously, the 
exponential convergence, as in Eq. (5.3), is optimal, but it may as well turn out that 
the particular implementation (e.g., with a different Hamiltonian) does not allow for it. 
We can, however, hope for a slower (but still quite decent) convergence rate. According 
to a theorem of Horsch and von der Linden [62], certain quantum spin systems possess 
low-lying eigenstates of the finite-lattice Hamiltonian with 

AE^ie) = c/ |A| . 

The conditions for this to hold are the following. There has to exist an order observable 
0\ of the form 

/gA 

where each 0/ is a self-adjoint operator such that [Oi, O//] = 0. Furthermore, for any inter- 
action term $a/ca the perturbed Hamiltonian H\{e) (these also include the perturbation 
terms), we will have [$m, Oi] = unless / G M. The operators <I>a/ and 0/ are required to 
be uniformly bounded (in M and I respectively) , and the cardinality of the support set M 
must not exceed some fixed constant C (the latter condition has also to be fulfilled for the 
perturbation theory described above to converge). Finally, if is an eigenstate of H\{e), 
then we must have {iP\Oa\iIj) = 0, but (iplOWip) > ( \Af (here the constant ( depends on 
Oi). The latter conditions are taken as manifestations of obscured symmetry breaking. 
Examples of systems for which the Horsch- von der Linden theorem holds include [70] the 
Ising model in the transverse magnetic field or the Heisenberg antiferromagnet with a Neel 
order. 

5.4 Summary 

Where does it all take us? It appears, from the discussion in the preceding section, that 
any quantum spin system, whose Hamiltonian is formed by mutually commuting n-body 
interactions that satisfy the Peierls condition, can recover from sufficiently weak quantum 
(i.e., off-diagonal) perturbations at low temperatures. The admissible perturbations can 
be either finite-range [61], or exponentially decaying [13]. Under these (quite general) 
conditions, it follows from rigorous perturbation theory for quantum spin systems that 
there exists a critical perturbation strength eo, such that, for all e < eo, the degeneracy 
and the symmetry properties of the ground state of the original (unperturbed) system 
survive in the thermodynamic limit. Furthermore, even if ground-state degeneracy is 
removed by perturbation of the finite-size system, the effect of the error (perturbation) 
is effectively "washed out" in the thermodynamic limit, as the low- lying excited states of 
the perturbed system converge to additional ground states. 

However, the systems we have considered were assumed to have classical Hamiltoni- 
ans and discrete symmetries. What about truly quantum Hamiltonians and continuous 
symmetry (e.g., the quantum Heisenberg model)? The situation here is not so easy. For in- 
stance, it is apparent from our discussion that, in order to be self-correcting, the perturbed 
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system must exhibit an order-disorder transition as the parameter e is tuned: in the "or- 
dered phase," error correction is possible; in the "disordered phase," occurrence of errors 
results in irrevocable loss of information. (This has already been noted by Dennis et al. 
[32].) Since we require the ground state of the perturbed Hamiltonian to exhibit the same 
degeneracy as the corresponding state of the unperturbed Hamiltonian, it makes sense 
to talk about spontaneously broken symmetry in the ordered phase (i.e., when e < eo). 
However, according to the so-called Goldstone theorem [73], symmetry cannot be broken 
in a system with continuous symmetry and a gap. It would certainly be worthwhile to 
explore physical error correction in systems with continuous symmetries as well, but the 
models in which it can work will not be as easy to find. 



CHAPTER 6 



Conclusion 



The now-fashionable field of "physics of information and computation" is at least 73 years 
old if we count from 1929, the year when Leo Szilard published his seminal paper [133] on 
the Maxwell's demon. Today, the amount of published work in this area is astounding; to 
be sure, we have gained quite a bit (pun intended) of new knowledge and new insights, 
owing to the continuous cross-fertilization between the disciplines of mathematics, physics, 
and computer science. However, we hardly made a dent in the Big Problem of harnessing 
the enormous information-processing potential of quantum-mechanical systems. Reliable 
storage of quantum information still remains the paramount challenge. 

In this dissertation, we presented a systematic study of the dynamical aspects of in- 
formation storage in quantum-mechanical systems. We have already outlined in Chapter 
1 the way in which dynamics (statistical dynamics, in particular) relates to our investiga- 
tion of information storage in noisy quantum registers and computers. Let us therefore 
elaborate on the "big picture." 

First of all, what do we mean by "information?" We take a very simple approach: we 
refer to any assignment of an initial state (density operator) as the information stored in 
the register (or supplied to the computer). Sstatistical dynamics then comes in useful as we 
attempt to follow the orbit traced over time by this initial information in the state space of 
the register (computer). Now, given any pair of different initial states, the states along the 
corresponding orbits in the noiseless case will be distinguishable from one another exactly 
to the same extent as the input states. This, in general, will not be the case when noise 
is present. It is precisely this feature that is central in our analysis of the noisy dynamics. 

In a sense, quantum information science is a nonequilibrium theory: it deals with 
systems whose quantum-mechanical states are not necessarily the thermodynamically fa- 
vorable ones. We believe, therefore, that an important aspect of the noisy dynamics of 
quantum registers and computers is the tendency towards equilibrium. Since the rate of 
convergence to equilibrium is expected to be quite rapid (and this is precisely the behavior 
we have shown strictly contractive dynamics to exhibit), any active intervention (such as 
error correction) would have to take place extremely often. In fact, if the computational 
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network is very elaborate, we may expect most of it to be taken up by the degrees of 
freedom that are responsible for keeping the computation stable. However, because these 
degrees of freedom are also susceptible to the noisy dynamics, the overall trend toward 
equilibrium will still be present. 

In this dissertation, the noise afflicting the quantum register (computer) was modeled 
by a strictly contractive channel [104, 106]. This model is justified for several reasons, the 
main one being the rapid convergence of disjoint orbits toward each other. This behavior 
naturally leads towards ergodicity and mixing, two important ingredients in the theory 
of approach to equilibrium. Furthermore, strictly contractive channels give us a way 
to incorporate the crucial assumption of finite precision of any experimentally available 
apparatus into the mathematical model of a physically realizable (i.e., nonideal) quantum 
computer. We have shown, in particular, that no two states of such a computer can 
be distinguished from one another with absolute certainty, even if they are maximally 
distinguishable in the noiseless case. Finally we have shown show that, given any channel 
T, there will always be a strictly contractive channel T' in any neighborhood of T in the 
cb-norm topology. Using the fidelity measure we have developed for quantum channels 
[103], we showed that, for any channel T, there always exists a strictly contractive channel 
T' that cannot be distinguished from T by any experimental means. We then went on to 
demonstrate that, in the absence of error correction, the sensitivity of quantum memories 
and computers to strictly contractive errors would grow exponentially with storage time 
and computation time respectively, and would depend only on the contraction rate and on 
the measurement precision. We proved that strict contractivity rules out the possibility of 
perfect error correction, and gave an argument that approximate error correction, which 
covers previous work on fault-tolerant quantum computation as a special case, is possible. 

We have then applied our model to the problem of determining the threshold error 
rate for noisy quantum computation. If the noise is sufficiently weak, we may model the 
decoherence mechanism by a depolarizing channel, the error rate being precisely the rate 
of depolarization. We would like to emphasize that we did not make any assumptions 
about the specific procedure employed for error correction, nor did we appeal to combina- 
torial considerations. The threshold error rate was shown to depend on the measurement 
precision and on the physical circuit depth. We presented some numerical estimates for 
the threshold error rate for the case when the measurement precision is on the order of 
the standard quantum limit, and found that, even with such ridiculously precise measure- 
ments, the maximum tolerable error rate would drop to zero extremely rapidly for circuits 
of polynomial and superpolynomial physical depth. 

After having described the general properties of strictly contractive channels, along 
with implications for quantum information processing, we took up the following question 
[107]. How does strict contractivity relate to the balance of energy and entropy in a 
noisy quantum register (computer)? We found that there is a close connection between 
the contraction rate of a channel and the rate of entropy production in a noisy quantum 
computer. We adapted the so-called "entropy-energy arguments" in order to determine 
the maximum number of operations that can be carried out reliably on a noisy quantum 
computer in terms of energy and temperature, thus enabling us to judge the thermody- 
namic cost of keeping the computer stable. Ideally we would like to do error correction 
as infrequently as possible; the longer the relaxation time, the closer we will be to this 
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goal. We also proved that, under certain conditions, there exists an upper bound on the 
number of qubits in a circuit-based quantum computer. 

Finally we looked into the possibility of using quantum spin systems with phase transi- 
tions for reliable storage of quantum information [105]. Our inspiration came from Kitaev's 
idea [67] to store quantum information in the degenerate ground state of a system of in- 
teracting anyons on a periodic lattice. Proper treatment of the ground states calls for the 
analysis of low-temperature behavior of quantum spin systems, where the main issue is not 
entropy, but rather the energy fluctuations above the ground-state level, caused by quan- 
tum perturbations. We have, in particular, addressed the following question: what kinds 
of interactions are admissible for constructing such quantum memory devices, and what 
are the perturbations against which these memories will be stable? We indicated that a 
few results in rigorous statistical mechanics of quantum spin systems [112] point toward 
the conclusion that such "self-correction" is fairly common in quantum spin systems with 
Hamiltonians that are comprised by interactions satisfying the so-called Peierls condition 
(the standard example being an Ising-type Hamiltonian), the admissible perturbations 
being either flnite-range or exponentially decaying. 

Most of the results we have presented in this dissertation are of a somewhat negative 
nature. The implications, however, are more of a blessing than a curse for the future 
of quantum information processing. We believe that the successful solution of problems 
faced by researchers in this fleld will require models of computers far more ingenious than 
networks of one- and two-qubit gates. As we have mentioned in Chapter 3, massively 
parallel systems of interacting particles (quantum cellular automata) may well prove to 
be a viable medium for the experimental realization of large-scale quantum computers. 



Appendix A 

Mathematical background 



A.l C*-algebras 

We summarize here the absolute minimum of the C*-algebra theory. For the mathematical 
treatment, the reader is referred to the books by Bratteli and Robinson [16], Conway [24], 
and Davidson [30], and for the C*-algebras in the context of quantum theory and statistical 
mechanics to the books by Emch [:)S], Haag [56], and Streater [132]. 
First we give a few definitions. 

Definition A. 1.1 An algebra is a complex linear space A, equipped with the product 
operation {A, B) e Ax AB G A such that, for all A, B,C & A and all a, (3 e C, (1) 
A{BC) = iAB)C, (2) A{B + C) = AB + AC, (3) ia/3){AB) = {aA){f3B). The product 
operation does not have to be commutative; an algebra with the commutative product is 
called abelian or simply commutative. 

Definition A. 1.2 An algebra with identity or a unital algebra is an algebra A with the 
unique element 1 E A such that Al = lA = A for any A E A. 

Definition A. 1.3 An involution on an algebra A is a mapping A E A ^ A* E A such 
that, for all A, B E A and all a,/3 E C, (1) {A*)* = A, (2) (AB)* = B*A*, (3) {aA + 
f3B)* = aA* + I3B*. An element A E A with A = A* is called self-adjoint. An algebra 
with an involution is referred to as a *-algebra. 

Definition A. 1.4 An algebra A is a normed algebra if it is equipped with a norm ||-|| 
which is, in addition to the usual properties of the norm, submultiplicative, i.e., for any 
A, B E A, < ll^ll If o normed algebra A is a complete normed space, and 

if the norm has the property \\A*\\ = \\A\\ for any A E A, then A is called a Banach 
*-algebra. 

Definition A. 1.5 A Banach *-algebra A is called a C*-algebra if its norm has the C*- 

1 1 2 

norm property = \\A\\ . 
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In this case, the requirement that the involution on A be isometric with respect to 
the norm is redundant, since this property follows from the C*-norm property and the 
submultiplicativity of the norm. 

There are two classic examples of C*-algebras. (1) Let ^ be a compact Hausdorff 
space, and let C{^) be the set of all continuous complex-valued functions on . Then 
C{^) is a C*-algebra with the operations defined pointwise, (/ + g){x) := /(x) + g{x), 
{f9){x) ■■= f{x)g{x), and f*{x) := /(x), and the norm ||/|| := sup^.g,^- The C*- 

algebra C{^) is an abelian algebra. (2) Let Jif be a Hilbert space, and let B{Jif) be the 
set of all bounded operators acting on J^. Then B{Jif) is a C*-algebra with the usual 
sum and product operations, and the involution given by the Hilbert-space (Hermitian) 
adjoint. The norm is the operator norm \\A\\ := sup^gj^.y^n^]^ H^-?/;!!. The C*-algebra 
B{J^) is a noncommutative algebra. Both of these algebras are algebras with identity; in 
the first case, the identity is the constant function 1, and, in the second case, the identity 
is the identity operator, lip = ip for all ip G J^. 

It turns out that these two examples are already exhaustive in the following sense. A 
theorem of Gelfand and Naimark asserts that, for any C*-algebra A, there exists a Hilbert 
space such that A is isomorphic to B{J^). Furthermore, according to a theorem of 
Gelfand, any commutative C*-algebra A is isomorphic to the algebra Co{,2^) of complex- 
valued continuous functions that vanish at infinity on some locally compact Hausdorff 
space J^. [This algebra is a more general object than C{^) defined above, but, whenever 
A has an identity, the space ^ will automatically be compact.] From now on we assume 
that all the C*-algebras, with which we are dealing, have an identity. 

A. 2 States, representations, and the GNS construc- 
tion 

Definition A. 2.1 An element A of a C*-algebra A is called invertible if there exists an 
element A"^ , called the inverse of A, such that AA~^ = A~^A = I. The resolvent set of A, 
denoted by r{A), is the subset of C consisting of all complex numbers A such that A — XI 
has an inverse. The spectrum of A, denoted by cr{A), is the complement of r{A) in C, 
a{A) := C\r{A). 

It is a celebrated result in spectral analysis that the spectrum of any A E A 
is a nonempty compact set. In particular, the spectral radius of A, defined as 
ta '■= sup;^g^(^) |A|, does not exceed the norm of A. 

In case of a self-adjoint A E A, the spectrum (t{A) is contained within the interval 
[— \\A\\ , \\A\\] of the real line, and ta = ||^||- A self-adjoint element A of a C*-algebra 
A is called positive (this is denoted hj A > 0) if (j{A) C [0, \\A\\]. An element A E A is 
positive if and only if there exists some B E A such that A = B*B. 

After these preliminaries, we can define a state over a C*-algebra. 

Definition A. 2. 2 A state over a C*-algebra A is a normalized positive linear functional 
uj over A, i.e., = 1 and uj{A*A) > for any A E A. The set S{A) of all states over 
a C*-algebra A is a convex set, and its extreme points are referred to as pure states. 
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A canonical example of a state over a C*-algebra is furnished by considering the algebra 
B{Jif) of bounded operators on some Hilbert space Let ip G be a unit vector, 
and define the linear functional ci;^(A) := {ip\Aip). It is quite easy to see that cj^ is a 
state. A state defined in this way is called a vector state. It is the gist of the famous 
Gelfand-Naimark-Segal (GNS) construction that any state over a C*-algebra A has the 
form of a vector state over a C*-subalgebra of B{M') for some Hilbert space M'. For this 
reason, the GNS construction is central to the C*-algebraic quantum theory. However, in 
order to state it properly, we first have to introduce some additional machinery. 

Definition A. 2. 3 A *-homomorphism between C*-algebras A and B is a mapping vr : 
A^ B such that, for allA,B,C G A and all a, (3 G C, (1) 7r{aA + (3B) = a7r{A)+(3'iT{B), 
(2) Ti{AB) = 'iT{A)7r{B) , (3) tt^A*) = ir^A)*. In other words, a *-homomorphism between 
two C*-algebras is a mapping that preserves the C*-algebraic structure. A bijective *- 
homomorphism is referred to as a ^-isomorphism. 

Any *-homomorphism maps positive elements to positive elements because 7r{A*A) = 
■k{A*)it{A) = ■it{A)*it{A) > 0, and is also continuous: \\it{A)\\ < \\A\\. 

Definition A. 2. 4 A representation of a C*-algebra A is a pair (J^^ir), where M' is a 
Hilbert space and n is a * -homomorphism of A into B{Jif). The representation {J^, vr) of 
A is called faithful if kern := {A G ^|7r(A) = 0} is trivial. 

From now on, when we talk about representations, we will omit the mention of the Hilbert 
space ^ whenever it is clear from the context which Hilbert space we are talking about. 

Definition A. 2. 5 A vector Q G Jif is called the cyclic vector for the representation 
{J€',7r) of a C*-algebra A if the set {'k{A)VL\A G A} is dense in M' , i.e., if for any 
G and any e > 0, there exists some A ^ A such that \\(f) — 7r{A)Q\\ < e. The triple 
{J^, vr, Q) is called the cyclic representation of A. The representation {J^, vr) is called 
irreducible if every vector ip G is cyclic for n or, equivalently, if the only invariant 
subspaces of the setn^A) := {7r(A)|A G A} are {0} and J^. Otherwise, the representation 
is called reducible. 

It can be shown that any representation of a C*-algebra as an algebra of operators over 
a Hilbert space can be decomposed into a direct sum of irreducible representations. In 
general, a set M. of bounded operators on a Hilbert space Jif is called irreducible if it has 
no nontrivial invariant subspaces. Thus we can say that the representation {Jif, it) of a 
C*-algebra A is irreducible if and only if the set it (A) is irreducible. 

A useful irreducibility criterion is provided by Schur's lemma, which states that a set 
A4 C B{^) which is self-adjoint (i.e., closed under the operation of taking the adjoint), is 
irreducible if and only if the commutant of M., i.e., the set M' := {X G B{J^)\[X, M] = 
0, VM G A^}, consists only of complex multiples of the identity operator (this is written 
as Ai' = CI). Thus the representation {Jif, vr) of a C*-algebra A is irreducible if and only 
if n{A)' = CI. 

Now we are ready to state the theorem which is the essence of the GNS construction. 
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Theorem A. 2. 6 (Gelfand-Naimark-Segal) Let u be a state over the C*-algebra A. 
Then there exists a cyclic representation {J^, vr, Q) of A such that 

uj{A) = {Q\7r{A)Q) 

for all A & A, and Q is a unit vector. This representation, to which we will refer as the 
GNS representation of A associated with u, is unique up to unitary equivalence. 

Given a C*-algebra A and a state u, the corresponding GNS representation is irreducible if 
and only if a; is a pure state. This result has an interesting consequence for pure states over 
abelian C*-algebras. Namely, a; is a pure state over an abelian C*-algebra A if and only 
iiu;{AB) = u{A)uj{B) for all A, B e A. Indeed, let vr) be the corresponding GNS 
representation. Since u; is a pure state, the representation is irreducible, and therefore 
it{A)' = CI. But, because A is abelian, we have 7t{A) C 7r{Ay, which means that vr is 
irreducible if and only if Jif is one-dimensional, i.e., isomorphic to C. The factorization 
property of u is now apparent. 

A.3 Trace ideals of 8{^) 

Definition A.3.1 Let A be an algebra. A subset X of A is called a two-sided ideal (or 
simply an ideal) of A if, for any I eT and any A E A, the elements AI and I A are also 
in X. 

In this section we will give a brief description of a class of ideals of the algebra B{J^), the 
so-called trace ideals. For a well-written exposition of the theory of trace ideals, as well as 
its applications to mathematical physics, consult the text by Simon [123]; another good 
source is the classic monograph by Schatten [119]. 

The starting point in the theory of trace ideals is the concept of a compact operator. 

Definition A.3. 2 A bounded operator A G B{Jif) is called a finite-rank operator if it has 
finite- dimensional range. A bounded operator is called compact if it is a norm limit of 
finite-rank operators. 

Let C{J^) denote the set of all compact operators on J^. It is easy to see that C{Jif) is 
a two-sided ideal of J^". Indeed, if A is a finite-rank operator and 5 is a bounded operator, 
then AB and BA are both finite-rank operators. Because C{Jif) is a norm closure of the 
set of all finite-rank operators, we see that AB and BA are compact whenever A is compact 
and B is bounded. In fact, any two-sided ideal of B{J^) is a subset oiC{M'). Furthermore, 
we have the following key theorem. 

Theorem A. 3. 3 Let A be a compact operator. Then A has a norm-convergent canonical 
expansion 

N 
n=l 

where N is a nonnegative integer or infinity, each ^n{A) > with fii{A) > fi2{A) > 
and {ipn} CLnd {0n} o'^e (not necessarily complete) orthonormal sets. Moreover, the 
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numbers /in(^), called the singular values of A, are the nonzero eigenvalues of \A\ : = 
{A*AY^'^ , arranged in descending order. 

Now suppose that we are given a compact operator A. For p = 1, 2, . . . , define the Schatten 
p-norm of A as 




= (tr \A\y/P 



Then A is said to belong to Schatten p-class if \\A\\^ is finite; in this case we will write 
A e Tp[Jif). It can be shown that, for any p, the Schatten p-class Tp{^) is a two-sided 
ideal of B{J^)] alternatively, Tp{Jif) is the closure of the finite-rank operators in the 
Schatten p-norm. 

We are mainly interested in the cases p = 1 and p = 2. Let us look at the first case, 
where we have the norm 

\\A\\, :=tr \A\, 

referred to as the trace norm. The Schatten 1-class Ti{J^) is also referred to as the trace 
class, and any operator A G Ti{J^) is called a trace-class operator. For a self-adjoint trace- 
class operator A we also have \tT A\ < \\A\\^. Furthermore, for any trace-class operator A 
and any bounded operator B, we have the inequalities 

\\AB\\^ < \\B\\\\A\\^ 
\\BA\\^ < \\B\\\\A\\^, 

which can, of course, be taken as an indication that the trace-class is a two-sided ideal of 

When p = 2, the corresponding p-norm, 

\\A\\^ := (tr \Afy/^ = [tT{A*A)Y/^, 

is called the Hilbert- Schmidt norm, and the space T2{Jif) is called the space of the Hilbert- 
Schmidt operators. The Hilbert-Schmidt norm is the norm induced by the inner product 
tr {A*B), and it can be shown that the space of the Hilbert-Schmidt operators is a Hilbert 
space. The product of a Hilbert-Schmidt operator A and a bounded operator B, in any 
order, is again a Hilbert-Schmidt operator with 

\\AB\\,,\\BA\\,<\\B\\ \\A\\,. 

This shows that the Hilbert-Schmidt space T2{Jif) is a two-sided ideal of B{Jif). 

A. 4 Fixed-point theorems 

Let ^ he a. metric space with the metric d{-, ■). An operator A : ^ ^ ^ is called a 
contraction if, for any x,y E d{Ax,Ay) < d{x,y), and a strict contraction if there 
exists some k G [0,1) such that d{Ax,Ay) < kd{x,y). If <^ is a complete metric space, 
then the contraction mapping principle [108] states that any strict contraction A on ^ 
has a unique fixed point. In other words, the problem Ax = x has a unique solution on 
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. U '3^ is a. closed subset of ^ , then it follows that any strict contraction A : '3^ '3^ 
has a unique fixed point on 

Strict contractivity is a remarkably strong property. Indeed, if we pick any y G '3^, 
then the sequence of iterates A"'y converges to the fixed point yo of A exponentially fast, 
because 

d{A-y, yo) = d{A-y, A^yo) < k^diy, y,). (A.l) 

This fact is of tremendous use in numerical analysis when one wants to solve the fixed- 
point problem Ay = yhy the iteration method with some initial guess y. If the operator 
A is a strict contraction on a closed subset of a complete metric space, then, for any choice 
of y, the iteration method is guaranteed to zero in on the solution in 0(loge~^) steps, 
where e is the desired precision. 

It should be noted that existence and uniqueness of a fixed point of some operator A 
are, by themselves, not sufficient to guarantee convergence of the sequence of iterates A^y 
for any point y in the domain of A. Indeed, according to the Leray-Schauder-Tychonoff 
theorem [108], any continuous map on a compact convex subset of a locally convex space 
^ has at least one fixed point. Furthermore, any weak contraction on a compact subset C 
of a Banach space, i.e., a map W : C ^ C with the property \\Wx — Wy\\ < \\x — y\\ for 
any x,y E C, has a unique fixed point [J-'2n]. The key to the rapid convergence in Eq. (A.l) 
is the fact that a strict contraction A : '3/^ ^ '3^ shrinks distances between points of ^3^ 
uniformly. 
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J\ m iQomn'r'nnip r> 




operator algebras 




the algebra of bounded operators on the Hilbert space ^ 


CI 


the set of the complex multiples of the identity operator 




Hilbert spaces 




identity operator 


id 


identity mapping 




identity mapping on M.^ 


Mn 


the algebra of n x n complex matrices 


M' 


matrix transpose of M 


p,a 


density operators 


SiA) 


the state space of the C*-algebra A 




the set of density operators on M' 


C^l, 0-2, 


Pauli matrices 


%{^) 


Schatten p-class on 


|X| 


cardinality of the set X 


X* 


operator adjoint to the operator X 




general (e.g., topological or measurable) spaces 


z 


complex conjugate of z G C 
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